Genepattern .cn File
Additional file 1 Example of a quality assessment report. Please use any ZIP-compatible software to extract the ZIP archive file into a folder and then open the index.html file in your web browser. The interactive report shows an example of quality assessment with samples in rows and performed tests in columns. The results are color-coded with green indicating no problems, yellow indicating a warning, and red suggesting the failure of a certain test on a certain sample. The user should review flagged samples and decide whether further actions are required. Clicking on a heading shows an overview plot for a particular test.
Clicking on a ‘+’ sign will expand the appropriate section, revealing detailed test results. Individual dots can be clicked on to provide the supporting analyses of tested FCM parameters underlying the final call. The example demonstrates a quality assessment report of a 96 well plate of a “Normal Donor” study performed by Becton, Dickinson and Company (BD) in order to measure immune responses to various infectious agents and cancer antigens among healthy young adults. The ≈ 8 GB of data from the mentioned study may be downloaded from. Additional file 2 GenePattern Flow Cytometry Suite Modules. Please use any ZIP-compatible software to extract the ZIP archive file into a folder.
After extraction, the folder will contain 34 ZIP files, each of these represents one of the GenePattern modules in the GenePattern Flow Cytometry Suite. If you are hosting your own GenePattern server then you can install the GenePattern Flow Cytometry Suite from these module ZIP files by following “Modules & Pipelines”, “Install from zip” at your local GenePattern site. The latest version of these modules can always be obtained by navigating to the particular module at the GenePattern public server web site and following the “export” link. Additional file 3 Source Codes of GenePattern Flow Cytometry Suite Modules.Please use any ZIP-compatible software to extract the ZIP archive file into a folder. There will be 35 folders after the extraction. The folder named lib contains the CFCS library that is required in order to compile the Java-based modules. In addition, the lib folder contains a ZIP compression tool that is being used by the FCMSinglePanelQC and PlateQAFCS modules to compress the results in case the server computer is a Windows-based machine.
This tool is not required for Linux/Unix or Mac based servers. The additional 34 folders contain the source codes of each of the modules in the GenePattern Flow Cytometry Suite. BackgroundTraditional flow cytometry data analysis is largely based on interactive and time consuming analysis of series two dimensional representations of up to 20 dimensional data.
Recent technological advances have increased the amount of data generated by the technology and outpaced the development of data analysis approaches. While there are advanced tools available, including many R/BioConductor packages, these are only accessible programmatically and therefore out of reach for most experimentalists. GenePattern is a powerful genomic analysis platform with over 200 tools for analysis of gene expression, proteomics, and other data. A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research. ResultsIn order to bring advanced flow cytometry data analysis tools to experimentalists without programmatic skills, we developed the GenePattern Flow Cytometry Suite. It contains 34 open source GenePattern flow cytometry modules covering methods from basic processing of flow cytometry standard ( i.e., FCS) files to advanced algorithms for automated identification of cell populations, normalization and quality assessment.
Internally, these modules leverage from functionality developed in R/BioConductor. Using the GenePattern web-based interface, they can be connected to build analytical pipelines. Flow cytometryFlow cytometry (FCM) is a technique for counting and examining microscopic particles, such as cells, by suspending them in a stream of fluid and passing them individually past a detector. It allows the simultaneous multi-parametric analysis of the physical and chemical characteristics of up to thousands of particles per second. For more than 30 years, FCM has been widely used by clinicians, immunologists, and cancer biologists to distinguish different cell types in mixed cell sub-populations, based on the expression of cellular markers. GenePatternGenePattern is a powerful web-based application offering easy access to over 180 tools for analysis of gene expression, proteomics, and other data. An additional 100 tools are under development and testing at this point.
These tools are provided in the form of modules, typically written in R, Java, Matlab, or Perl. GenePattern was originally released in 2004 and now has more than 22,000 users world wide. Using a web-based interface, experimentalists can easily submit their data and choose suitable settings in order to perform complex analyses without detailed knowledge of the underlying programming language, algorithms and settings, allowing them to concentrate on the interpretation of biologically meaningful results. Besides executing various modules as standalone tools, users can also chain modules together to create automated analysis pipelines enabling reproducible in silico research, now also facilitated by a Microsoft Word add-in as part of the GenePattern Reproducible Research Document that allows scientists to embed their pipelines in a text document.
MethodsGenePattern is a web-based tool running within an Apache Tomcat application server. The GenePattern Flow Cytometry Suite (GP FCM Suite) is implemented as a set of GenePattern modules. These are command line-based software applications with formally defined syntax, inputs and outputs. These definitions reside in a manifest file, packaged together with the source code of the module and documentation in a module ZIP archive. The GP FCM Suite modules are developed in R, Java and C.
The R-based modules extensively reuse many existing flow cytometry related R/BioConductor packages-. Java-based modules reuse a Java library called CFCS, which is an open-source implementation of the Proposed API for reading and writing FCS files. The CFCS library was originally developed in 2003 by Tree Star, Inc. (Ashland, OR) and is now maintained by our group at the British Columbia Cancer Agency (BCCA) and is available from the flowcyt website.
Results and discussionWe previously proposed a general FCM data analysis framework consisting of seven steps: (1) Quality assessment, (2) Normalization, (3) Outliers removal, (4) Automated gating, (5) Cluster labelling, (6) Feature extraction and (7) Interpretation. Except for interpretation, the GP FCM Suite addresses all these steps; commonly with multiple modules and approaches (Figure).
The last step – interpretation – is highly dependent on the actual experiment type (design, hypothesis, type of clinical test, etc.). Therefore, we currently leave it up the the user to choose an appropriate approach to interpret experimental findings.
In addition to the proposed framework, the GP FCM Suite contains several Data preprocessing modules often required before the start of data analysis. Finally, manual gating was not considered in the general automated FCM data analysis framework. While we do not incorporate interactive manual gating in automated analysis pipelines, we still allow users to reuse results of manual gating for the analysis in GenePattern. Data previewThe first essential step in data processing is commonly the review of the contents of an FCS data file. This becomes especially important if a user is not familiar with the details about the data. The GP FCM Suite provides a data preview module, which lists the meta information stored in the file and provides details such as the number of events in the file (i.e., the number of particles, such as cells, whose characteristics have been captured in the file) and the number of parameters in the file (i.e., the number of distinct characteristics measured). Output is available as either as an HTML report (for human review) or an XML document (for further automated processing).
Adding and removing FCS keywords and parametersWe provide functionality for editing, adding or removing keyword/value pairs stored in the meta data section of FCS files (e.g., for de-identification of clinical data prior to sharing). In addition, we also offer modules to add or remove FCM parameters from data files. Adding a parameter is useful, for example, if calculated event (cell) features need to be stored. These may include assignments of cells into subpopulations as the result of a clustering algorithm. Removing parameters is useful for high content experiment with many markers where only a subset is included in a manuscript. Adjusting data scaleIn most FCM applications, fluorescence signals of interest can range over several decades.
Several transformations have been developed to provide more complete, appropriate, and readily interpretable representations. Via a dedicated module ( LogicleTransformFCS), the GP FCM Suite includes support for the Logicle / Biexponential, data transformation, the de-facto standard for contemporary visualization of FCM data. Additional transformation are supported via Gating-ML, including both FCM-specific transformations (e.g., Hyperlog, Split-scale) as well as more generic transformations (e.g., inverse hyperbolic sine, logarithmic).
CompensationCompensation is the process whereby the fluorescence spillover originating from a fluorochrome other than the one specified for a particular detector is subtracted as a percentage of the signal from other detectors. The inherent overlap of emission spectra from antibody fluorescent labels makes compensation necessary before proceeding with further analysis. The GP FCM Suite provides the ability to perform compensation based on the fluorescence spillover matrix that is commonly included within FCS files, or using compensation specification supplied externally. Quality assessmentData quality assessment (Figure, step 1) represents an important part of any data analysis, and quality control tests should be included at the beginning of data analysis and often at other steps of an analytical pipeline to identify differences in samples originating from changes in conditions that are probably not biologically motivated. Generally, methods establish a quality control criterion to give special consideration to abnormal samples or even exclude these from further analysis.Quality control tests in the GP FCM Suite are largely based on functionality implemented in the flowQ R/BioConductor package.
They include tests applicable to both, plate-based and single panel FCM data (e.g., cell number test, time flow test, Probability Density Function (PDF) and medians test of forward and side scatter for cell debris). An interactive HTML report is created after the execution of selected quality assessment tests displaying an overview table with rows corresponding to tested samples and columns to selected quality control tests. The results of these tests are color-coded with green indicating no problems, yellow indicating a warning, and red suggesting the failure of a certain test on a certain sample.
Clicking on the heading shows an overview plot for that particular test, and clicking on a particular sample/test result will reveal details about the execution of that test on that sample. It is left up to the user to review flagged samples and exclude individual samples from further analysis as appropriate. An example of a quality assessment report of a 96 well plate of a “Normal Donor” study performed by Becton, Dickinson and Company (BD) in order to measure immune responses to various infectious agents and cancer antigens among healthy young adults is included as Additional file.
FingerprintingFingerprinting generates a description of the multivariate probability distribution function of FCM data by transforming raw FCM data into a fingerprint form suitable for data quality assessment purposes as well as direct input into conventional statistical analysis and empirical modeling software tools. Fingerprinting is independent of a presumptive functional form for the distribution, in contrast with model-based methods such as Gaussian Mixture Modeling. Within GenePattern, we implement FCM fingerprinting functionalities based on the flowFP R/BioConductor package.
This approach is computationally efficient and able to handle large flow cytometry data sets of arbitrary dimensionality. NormalizationBetween-sample variation in high throughput FCM data represents a significant challenge for analysis of large scale data sets, such as those derived from multi-center clinical trials. It is often hard to match biologically relevant cell populations across samples due to technical variation in sample acquisition and instrumentation differences. Thus, normalization of data is a critical step prior to analysis, particularly in large-scale data sets from clinical trials, where group specific differences may be subtle and patient-to-patient variation common. The GP FCM Suite includes a normalization method that removes technical between-sample variation by aligning prominent features (landmarks) in the raw data on a per-channel basis as described by Hahne et al. Outliers removalBefore further analysis, users may want to perform initial data clean up, such as the removal of margin channel events. These may, for example, occur when the instrument detector voltages are set too high so that cells highly expressing certain markers create signals above the recordable range for corresponding parameters.
Events created by these cells will condense at the parameter top range value and eventually create artificial cell populations, which may cause problems for further analysis. The GP FCM Suite offers a module for data clean up, including the removal of saturated events and events believed to be caused by instrument errors (Figure, step 3). GatingGating is an inherent component of FCM data analysis; it is a process where particles (i.e., cells) are subsetted according to physical and fluorescence characteristics.
These properties are reflected in parameter values of events stored in FCS files. In practice, gating corresponds to assigning classes (labels) to these events. This can be done either manually or automatically. While manual gating is still dominant in traditional FCM, automatic gating methods are becoming more important in contemporary and high throughput approaches. The GP FCM Suite supports both (Figure, step 4), as described below. Manual gatingManual gating involves a combination of biological domain knowledge and visual inspection of the data. Typically, gate boundaries are drawn interactively on series of one or two dimensional data projections.
Within the GP FCM Suite, we do not support interactive manual gating as virtually all experimentalists performing manual gating use one of several commercially available software tools well suited for this purpose already. However, the GP FCM Suite still supports non-interactive manual gating based on the input of Gating-ML files, an open XML-based standard for encoding gating and data transformations. Number of sub-populationsMost clustering algorithms require some user input, such as the number of expected sub-populations to search for. Computationally, the estimation of the correct number of sub-populations present in a data set is difficult and may not only depend on the data but also on the goal of particular analysis.
For example, cells that could be considered as outliers in one case, could also represent a rare population that may be important for classification of a certain disease, or they could indicate other useful information about the subject. Therefore, in the GP FCM Suite, we typically do not integrate the automated selection of the number of sub-populations in most clustering algorithms. Instead, we provide a separate module that investigates the data and suggests the number of sub-populations to the user. This is graphically supported by the output of the Baysian Information Criterion (BIC) and the Integrated Completed Likelihood (ICL) score for a range of sub-population numbers.
Generally, these curves show how well the data can be modeled as a mixture of a certain number of populations. This approach has the advantage that the user may either accept the suggested value or select her/his own value based on prior knowledge and/or inspection of the BIC and/or ICL curves. Cluster labelingIndependent clustering of multiple flow cytometry samples (e.g., from different patients) results in dividing each of the input data files into several subsets corresponding to cell sub-populations in each of the particular sample. Another analytical step (i.e., Figure, step 5) is required to match (label) these sub-populations across different samples.
This label matching is usually performed by comparison of the position of each of the identified sub-populations. In the GP FCM Suite, we offer modules for assigning labels to previously clustered data sets from multiple flow cytometry samples. The data may have been previously clustered by any clustering algorithm. The cluster matching is performed using model-based clustering of the means of the previously clustered sub-populations and users can choose from several models to fit their data. In addition, the user shall specify how many distinct sub-populations are expected to be found across all the previously clustered FCM data. Similar to estimating the number of sub-populations in a single sample, there is also a module in the GP FCM Suite that can help with this estimation across multiple samples.
Feature extractionThe extraction of features (Figure, step 6) of identified sub-populations typically follows after gating and eventually labeling of FCM data. The main feature is simply the number (or proportion) of cells in different sub-population (e.g., how many cells are positive or negative for specific markers). In addition, one may be interested in the mean value of selected parameters (e.g., the mean fluorescence intensity – MFI – of a certain population of cells). The MFI can, for example, indicate the cellular response after specific antigen stimulation.
In the GP FCM Suite, we offer the calculation of cell number, proportion as well as mean parameter values. Other features that can be calculated include the integrated mean fluorescence intensity (iMFI), obtained by multiplying the cell proportion by the mean fluorescence value. InterpretationInterpretation of analytical results (Figure, step 7) is highly dependent on the actual experiment type, its design, the hypothesis being tested, the type of clinical test, etc. Therefore, it is likely impossible to create a generally applicable solution. In GenePattern, we created a few very specific modules to help researchers from BCCA test hypotheses related to their projects, such as the computational quantification of long-term reconstituting hematopoietic stem cells (HSC) from adult mouse bone marrow.
However, these modules rely on a very specific experimental design and tightly defined settings and protocols, and therefore, they are only useful for the laboratory they have been designed for. Consequently, we have not included these modules in the GP FCM Suite and we are leaving it up the researchers to decide about the best way to interpret their experimental findings. Availability and requirementsAll GP FCM Suite modules are available from GenePattern ; these can be run directly on the public server hosted at Broad Institute of MIT and Harvard or downloaded for use with own installation of GenePattern server.
Currently available modules are also included in Additional fileand their source codes in Additional file. In addition, newest modules developed in the future may be accessed from the GenePattern beta server before their official release through GenePattern. The GP FCM Suite is distributed as open source under the GNU LGPL 3.0 license. All used R libraries are freely available and their licensing conditions are specified on the download page of each specific library in the appropriate R repository, such as CRAN or BioConductor. GenePattern server software is freely available under the GenePattern License Agreement. The text of this license agreement is available at.
ConclusionsTraditional FCM data analysis involves the interpretation of individual two-dimensional scatter plots culled from sets of simultaneous analysis of highly multidimensional data. Recent technological advances have increased the amount of data generated by the FCM technology and outpaced the development of analytical approaches. While it is becoming clear that analysis methods based on manual gating are unsuitable for the increased amount of data and simultaneously measured fluorescence parameters, they still represent the main functionality in commercial FCM data analysis software.
The need for new analytical approaches has been well recognized by the research community; however, advanced tools being developed are commonly released in the form of programming libraries (such as R/BioConductor packages) and therefore only accessible programmatically. Little effort is invested into making these available via user-friendly interfaces that would make these tools accessible for experimentalists without advanced programming skills.In order to address this issue, we have developed the GP FCM Suite consisting of GenePattern modules to analyze FCM data. The modules in the GP FCM Suite can help with quality assessment, normalization, outliers removal, gating/clustering, cluster labeling, feature extraction and other tasks.To the best of our knowledge, there is no other software tool that would provide a variety of advanced algorithms for the computational analysis of flow cytometry data via a user-friendly interface.
However, a few software tools, most of them commercial, integrate one or two of these algorithms. For example, FlowJo allows users to utilize automated clustering for the purpose of analyzing flow cytometry data. Cytobank has recently included the Cyto Spanning tree Progression of Density normalized Events (SPADE) algorithm to their hosted versions of Cytobank and DVS Cytobank servers. The Immunology Database and Analysis Portal (ImmPort,) integrates the FLOCK, analysis (also available as part of the GP FCM Suite). GemStone (Verity Software House,) offers a patented Probability State Modeling (PSM) technology to combine multiple samples and estimate missing parameter values. Finally, most of the major commercial third party software vendors, including Tree Star, De Novo Software, and Verity Software House, offer computational support for cell cycle analysis.
All these tools integrate some algorithms facilitating users willing to apply computational methods for the analysis of flow cytometry data. While the scope and variety of implemented methods is limited compared to all the modules offered by the GP FCM Suite, the increasing commercial support clearly shows the new trend of users seeking advanced algorithms to help them analyze the increasing amount of increasingly complex data. Users with programmatic skills will always get the most out of the advanced FCM analysis tools if they programmatically incorporate these in an analysis pipeline.
These users will have additional settings for various algorithms as well as the choice to encode more complex work flows compared to the options offered by GenePattern. However, we argue that most of the experimentalists have biology or medicine-related backgrounds and their programmatic skills are limited.
For them, having advanced analytical functionality accessible from a simple web-based user interface becomes very useful. AbbreviationsBCCA: British Columbia Cancer Agency; BIC: Baysian information criterion; CSV: Comma-separated values; ECDF: Empirical cumulative distribution function; FCS: Flow cytometry standard; FCM: Flow cytometry; HTML: HyperText Markup Language; GP FCM Suite: GenePattern flow cytometry suite; HSC: Hematopoietic stem cells; ICL: Integrated completed likelihood; iMFI: Integrated mean fluorescence intensity; MFI: Mean fluorescence intensity; MIT: Massachusetts Institute of Technology; PDF: Probability density function; SVM: Support vector machine; XML: eXtensible Markup Language. Authors’ contributionsJS led the module development effort, developed the majority of modules and wrote the initial version of the manuscript. AB, KB and PW contributed to module development.
KB developed an automated module unit testing framework. PC and AB developed high throughput extension of GenePattern server. PC developed mechanism for resolving module dependencies on GenePattern server. YQ and RHS developed the ImmPort FLOCK modules.
PC, MDN, BAH, TL, and MR provided extensive support for module development, testing, integration and hosting on the public GenePattern server at Broad Institute as well as in their module repository. RRB coordinated the BCCA development team and collaborations with the Broad Institute, JPM coordinated collaboration from the Broad Institute side and RPS from the Vaccine and Gene Therapy Institute. All authors reviewed and approved the final version of the manuscript. Additional file 1:Example of a quality assessment report.
Igv .seg File
Please use any ZIP-compatible software to extract the ZIP archive file into a folder and then open the index.html file in your web browser. The interactive report shows an example of quality assessment with samples in rows and performed tests in columns. The results are color-coded with green indicating no problems, yellow indicating a warning, and red suggesting the failure of a certain test on a certain sample.
The user should review flagged samples and decide whether further actions are required. Clicking on a heading shows an overview plot for a particular test. Clicking on a ‘+’ sign will expand the appropriate section, revealing detailed test results. Individual dots can be clicked on to provide the supporting analyses of tested FCM parameters underlying the final call. The example demonstrates a quality assessment report of a 96 well plate of a “Normal Donor” study performed by Becton, Dickinson and Company (BD) in order to measure immune responses to various infectious agents and cancer antigens among healthy young adults. The ≈ 8 GB of data from the mentioned study may be downloaded from.
Genepattern.org File Taxes
Additional file 2:GenePattern Flow Cytometry Suite Modules. Please use any ZIP-compatible software to extract the ZIP archive file into a folder. After extraction, the folder will contain 34 ZIP files, each of these represents one of the GenePattern modules in the GenePattern Flow Cytometry Suite. If you are hosting your own GenePattern server then you can install the GenePattern Flow Cytometry Suite from these module ZIP files by following “Modules & Pipelines”, “Install from zip” at your local GenePattern site.
The latest version of these modules can always be obtained by navigating to the particular module at the GenePattern public server web site and following the “export” link. Additional file 3:Source Codes of GenePattern Flow Cytometry Suite Modules.Please use any ZIP-compatible software to extract the ZIP archive file into a folder.
There will be 35 folders after the extraction. The folder named lib contains the CFCS library that is required in order to compile the Java-based modules. In addition, the lib folder contains a ZIP compression tool that is being used by the FCMSinglePanelQC and PlateQAFCS modules to compress the results in case the server computer is a Windows-based machine. This tool is not required for Linux/Unix or Mac based servers. The additional 34 folders contain the source codes of each of the modules in the GenePattern Flow Cytometry Suite.
Igv Download
Krutzik PO, Crane JM, Clutter MR, Nolan GP. High-content single-cell drug screening with phosphospecific flow cytometry.
Nat Chem Biol. 2008; 4(2):132–142. Doi: 10.1038/nchembio.2007.59. Darzynkiewicz Z, Crissman H, Jacobberger JW.
Cytometry of the cell cycle: cycling through history. Cytom Part A. 2004; 58A:21–32. Doi: 10.1002/cyto.a.20003. De Rosa SC, Brenchley JM, Roederer M.
Beyond six colors: a new era in flow cytometry. 2003; 9:112–117. Doi: 10.1038/nm0103-112. Mahnke YD, Roederer M.
Optimizing a Multicolor Immunophenotyping Assay. Clin Lab Med. 2007; 27(3):469–485. Doi: 10.1016/j.cll.2007.05.002. DVS Sciences.
CyTOF®; Instrument. Lugli E, Roederer M, Cossarizza A. Data analysis in flow cytometry: The future just started.
Cytom Part A. 2010; 77A:705–713. Doi: 10.1002/cyto.a.20901.
Bashashati A, Brinkman RR. A survey of flow cytometry data analysis methods. Adv Bioinformatics. 2009; 2009:1–19. Article ID 584603. Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J. Bioconductor: open software development for computational biology and bioinformatics.
2004; 5(10):R80. Doi: 10.1186/gb-2004-5-10-r80. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0.
2006; 38(5):500–501. Doi: 10.1038/ng0506-500.
Mesirov JP. Accessible reproducible research. 2010; 327(5964):415–416. Doi: 10.1126/science.1179653.
Hahne F, LeMeur N, Brinkman R, Ellis B, Haaland P, Sarkar D, Spidlen J, Strain E, Gentleman R. FlowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics. Strain E, Hahne F, Brinkman RR, Haaland P. Analysis of high-throughput flow cytometry data using plateCore.
Adv Bioinformatics. 2009; 2009:10. Article ID 356141. Le Meur N, Rossini A, Gasparetto M, Smith C, Brinkman RR, Gentleman R. Data quality assessment of ungated flow cytometry data in high throughput experiments. Cytom Part A. 2007; 76(6):393–403.
Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP, Seifert-Margolis V, Bourcier K, Asare A, Lumley T, Gentleman R, Brinkman RR. Per-channel basis normalization methods for flow cytometry data. Cytom Part A. 2009; 77(2):121–131.
Lo K, Hahne F, Brinkman R, Gottardo R. FlowClust: a Bioconductor package for automated gating of flow cytometry data. BMC Bioinformatics.
2009; 10:145. Doi: 10.1186/1471-2105-10-145. Finak G, Bashashati A, Brinkman RR, Gottardo RR. Merging mixture components for cell population identification in flow cytometry. Adv Bioinformatics.
2009; 2009:1–12. Article ID 247646. Zare H, Shooshtari P, Gupta A, Brinkman R. Data reduction for spectral clustering to analyze high throughput flow cytometry data.
BMC Bioinformatics. Aghaeepour N, Nikolic R, Hoos H, Brinkman R. Rapid cell population identification in flow cytometry data.
Cytom Part A. 2011; 79:6–13. Proposed API for Reading and Writing FCS files. CFCS - Java library for Reading and Writing FCS files.
Spidlen J, Moore W, Parks D, Goldberg M, Bray C, Bierre P, Gorombey P, Hyun B, Hubbard M, Lange S, Lefebvre R, Leif RR, Novo D, Ostruszka L, Treister A, Wood J, Murphy RF, Roederer M, Sudar D, Zigon R, Brinkman RR. Data file standard for flow cytometry, version FCS 3.1.
Cytom Part A. 2010; 77:97–100. Parks DR, Roederer M, Moore WA. A new “Logicle” display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytom Part A. 2006; 69(6):541–551.
Moore WA, Parks DR. Update for the logicle data scale including operational code implementations. Cytom Part A.
2012; 81(4):273–277. Spidlen J, Leif RC, Moore W, Roederer M, Brinkman RR. Gating-ML: XML-based gating descriptions in flow cytometry. Cytom Part A. 2008; 73(12):1151–1157. Bagwell CB.
Hyperlog – a flexible log-like transform for negative, zero, and positive valued data. Cytom Part A. 2005; 64:34–42. Battye FL.
A mathematically simple alternative to the logarithmic transform for flow cytometric fluorescence data displays. Wang K, Ng SK, McLachlan GJ. Multivariate skew t mixture models: applications to fluorescence-activated cell sorting data. Digit Image Comput: Tech Appl. 2009; 0:526–531.
Rogers WT, Holyst HA. FlowFP: A bioconductor package for fingerprinting flow cytometric data. Adv Bioinformatics. Article ID 193947.,. Naumann U, Luta G, Wand M.
The curvHDR method for gating flow cytometry samples. BMC Bioinformatics. Doi: 10.1186/1471-2105-11-44. Achuthanandam R, Quinn J, Capocasale R, Bugelski P, Hrebien L, Kam M. Sequential univariate gating approach to study the effects of erythropoietin in murine bone marrow. Cytom Part A. 2008; 73(8):702–714.
Boedigheimer MJ, Ferbas J. Mixture modeling approach to flow cytometry data. Cytom Part A.
2008; 73(5):421–429. Roederer M, Hardy RR. Frequency difference gating: A multivariate method for identifying subsets that differ between samples. Cytom Part A. 2001; 45:56–64. Doi: 10.1002/1097-031)45:13.0.CO;2-9.
Pyne S, Hu X, Wang K, Rossin E, Lin T, Maier L, Baecher-Allan C, McLachlan G, Tamayo P, Hafler D, De Jager P, Mesirov J. Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci. 2009; 106:8519–8524. Doi: 10.1073/pnas. Scheuermann RH, Qian Y, Wei C, Sanz I. ImmPort FLOCK: Automated cell population identification in high dimensional flow cytometry data.
2009; 182:42.17. Qian Y, Wei C, Eun-Hyung Lee F, Campbell J, Halliley J, Lee JA, Cai J, Kong YM, Thomson E, Dunn P, Seegmiller AC, Karandikar NJ, Tipton CM, Mosmann T, Sanz I, Scheuermann RH. Elucidation of seventeen human peripheral blood B-cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data. Cytom Part B: Clin Cytom. 2010; 78B:S69–S82.
Doi: 10.1002/cyto.b.20554. Bakker ST, De Grooth B. Cluster analysis of flow cytometric list mode data on a personal computer. Cytom Part A.
1993; 14(6):649–659. Doi: 10.1002/cyto.990140609.
Latorre I, De Souza-Galvao M, Ruiz-Manzano J, Lacoma A, Prat C, Fuenzalida L, Altet N, Ausina V, Dominguez J. Quantitative evaluation of T-cell response after specific antigen stimulation in active and latent tuberculosis infection in adults and children. Diagn Microbiol Infect Dis. 2009; 65(3):236–246.
Doi: 10.1016/j.diagmicrobio.2009.07.015. Darrah PA, Patel DT, De Luca PM, Davey DF, Flynn BJ, Hoff ST, Andersen P, Reed SG, Morris SL, Roederer M, Seder RA. Multifunctional TH1 cells define a correlate of vaccine-mediated protection against Leishmania major.
2007; 13(7):843–850. Doi: 10.1038/nm1592. Dykstra B, Kent D, Bowie M, McCaffrey L, Hamilton M, Lyons K, Lee SJ, Brinkman R, Eaves C. Long-term propagation of distinct hematopoietic differentiation programs In Vivo. Cell Stem Cell. 2007; 1(2):218–229. Doi: 10.1016/j.stem.2007.05.015.
Kotecha N, Krutzik PO, Irish JM. Web-based Analysis and Publication of Flow Cytometry Experiments. John Wiley and Sons, Inc: 111 River Street, Hoboken, NJ, USA. Current Protocols in Cytometry 2010 chap. Chapter 10, Unit 10.17. Qiu P, Simonds EF, Bendall SC, Gibbs KD, Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK.
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol. 2011; 29:886–891. Doi: 10.1038/nbt.1991.
Parameters.These are from Hierarchical Clustering 7. NameDescriptioninput filename.input data file name -.gct,.res,.pclcolumn distance measure.Distance measure for column (sample) clustering. Options include:.
No column clustering. Pearson correlation (default): Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. It is a measure for how well a straight line can be fitted to a scatter plot of x and y. If all the points in the scatter plot lie on a straight line, the Pearson correlation coefficient is either +1 or -1, depending on whether the slope of line is positive or negative. If it is equal to zero, there is no correlation between x and y. The information coefficient is a measure of the informaton-theoretic similarity between two variables. The information coefficient (IC) is +1 or -1 if two variables contain essentially the same information (e.g., one variable is a scaled version of the other or they are a mirror of each other).
The IC is 0 only when two variables are statistically independent. Unlike the Pearson correlation, this metric is sensitive to nonlinear relationships between variables. Note that this metric is very computationally intensive, hece it wil take is significantly longer to run than any other metric in this list. For more information refer to:. Uncentered correlation: The same as the Pearson correlation, except that the sample means are set to zero in the expression for uncentered correlation. The uncentered correlation coefficient lies between –1 and +1; hence the distance lies between 0 and 2. Uncentered correlation, absolute value: The same as the absolute Pearson correlation, except that the sample means are set to zero in the expression for uncentered correlation. The uncentered correlation coefficient lies between 0 and +1; hence the distance lies between 0 and 1. Pearson correlation, absolute value: The absolute value of the Pearson correlation coefficient is used; hence the corresponding distance lies between 0 and 1, just like the correlation coefficient.
Spearman’s rank correlation: Nonparametric version of the Pearson correlation that measures the strength of association between two ranked variables. To calculate the Spearman rank correlation, each data value is replaced by their rank if the data in each vector is ordered by their value. Then the Pearson correlation between the two rank vectors instead of the data vectors is calculated. It is useful because it is more robust against outliers than the Pearson correlation. Kendall’s tau: The Kendall tau distance is a metric that counts the number of pairwise disagreements between two lists. The larger the distance, the more dissimilar the two lists are.
Euclidean distance: Corresponds to the length of the shortest path between two points. Takes into account the difference between two samples directly, based on the magnitude of changes in the sample levels. This distance type is usually used for data sets that are normalized or without any special distribution problem.
City-block distance: Also known as the Manhattan or taxi cab distance; the city-block distance is the sum of distances along each dimension between two points.row distance measure.Distance measure for row (gene) clustering. Options include:. No row clustering (default). Pearson correlation.
Information coefficient. Uncentered correlation. Uncentered correlation, absolute value. Pearson correlation, absolute value. Spearman’s rank correlation. Kendall’s tau.
Euclidean distance. City-block distanceNOTE: Filtering beforehand is recommended since row clustering is computationally intensive.clustering method.Hierarchical clustering method to use. Options include:.
Pairwise complete-linkage: The distance between two clusters is computed as the maximum distance between a pair of objects, one in one cluster and one in another. Pairwise average-linkage (default): The distance between two clusters is computed as the average distance between the elements in the two clusters.row centerSpecifies whether to center each row (gene) in the data. Centering each row subtracts the row-wise mean or median from the values in each row of data, so that the mean or median value of each row is 0. Default: norow normalizeSpecifies whether to normalize each row (gene) in the data. Normalizing each row multiplies all values in each row of data by a scale factor S so that the sum of the squares of the values in each row is 1.0 (a separate S is computed for each row). Default: nocolumn centerSpecifies whether to center each column (sample) in the data. Centering each column subtracts the column-wise mean or median from the values in each column of data, so that the mean or median value of each column is 0.
Default: nocolumn normalizeSpecifies whether to normalize each column (sample) in the data. Normalizing each column multiplies all values in each column of data by a scale factor S so that the sum of the squares of the values in each column is 1.0 (a separate S is computed for each column).
Default: nooutput base name.Base name for the output filesoutput distance matrixWhether or not output the pair-wise distance matrix. If true, the distance between each column will be computed, which can be very computationally intensive. If unsure, leave as False. Default: False. required. Output Files.These are from Hierarchical Clustering 7. fileContains the original data, but reordered to reflect the clustering.
file (if clustering by columns/samples) or file (if clustering by rows/genes)These files describe the order in which nodes were joined during the clustering. distancematrix.txtThis is a tab-separated file which contains all the distances used to compute the clustering.License.These are from Hierarchical Clustering 7.HierarchicalClustering is distributed under a modified BSD license available at Platform Dependencies.