GeneTrail2 1.6
Statistical analysis of molecular signatures

Tutorial for GeneTrail 2:
GeneTrail 2 offers you two main variants for data input. You either may choose to download preprocessed and normalized expression data from GEO or provide a precomputed list of scores.
Gene Expression Omnibus
The Gene Expression Omnibus (GEO) is a MIAME compliant online database for microarray experiments. Normalized data is stored in the GEO SOFT format, whereas unprocessed data is stored in a platform dependent raw format. Currently GeneTrail 2 supports the SOFT format for various platforms and organisms:
Supported Organisms
- Homo Sapiens (9606)
- Mus Musculus (10090)
- Rattus Norvegicus (10116)
- Arabidopsis thaliana (3702)
- Danio rerio (7955)
- Drosophila melanogaster (7227)
- Caenorhabditis elegans (6239)
- Anopheles gambiae (180454)
- Bos taurus (9913)
- Canis familiaris (9615)
- Gallus gallus (9031)
- Plasmodium falciparum 3D7 (36329)
- Pan troglodytes (9598)
- Sus scrofa (9823)
When using a record from GEO, GeneTrail 2 relies on the proper normalization of the stored data. If you want to normalize the data yourself, you will need to obtain and process the raw data from GEO and upload a score file.
The SOFT format is supported for GEO Datasets (GDS) and GEO Series (GSE). GeneTrail 2 requires you to select either one GSE record and distribute the contained samples into a test set and control set or select two GDS records that define your sample and reference set.
In this step a score for differential expression between the two groups is calculated.
If your test group consists of multiple samples you can choose from the following scoring schemes:
- Independent Shrinkage t-Test
- Independent Students t-Test
- Wilcoxon Rank Sum Test
- Signal to Noise Ratio
- F-Test
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient
If however your test group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:
- z-score
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient
In this step a score for differential expression between the two groups is calculated.
If your sample group consists of multiple samples you can choose from the following scoring schemes:
- Independent Shrinkage t-Test
- Independent Students t-Test
- Wilcoxon-Mann-Whitney-Test
- Signal to Noise Ratio
- F-Test
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient
- Mean-Fold-Difference
If your sample and reference groups have the same size the following test statistics can also be chosen:
- Paired Student's t-Test
- Wilcoxon Matched Pairs Signed Rank Test
If your sample and reference groups more than 15 samples:
- Pearson Correlation
- Spearman Correlation
If however your sample group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:
- z-score
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient
- Mean-Fold-Difference
- Enrichment algorithms:
- Algorithms to find deregulated subgraphs in regulatory networks:
Subgraph size
Here, you can either enter a single value or a range of values for the size of the subgraph:- single value
- Single values, e.g.,
25
- single range
- Separate values by a dash, e.g.,
10-25
- multiple ranges
- Separate ranges by a semicolon, e.g.,
1-12; 15-20; 25-30
Scoring
Scoring mode
You can specify whether you want to use positive and negative values or absolute values as scores.Node Mapping
Furthermore, you can specify how the scores for composite nodes (families and complexes) are computed:- Maximum
- This option causes the score of the member with the highest score to be used as the score of the composite node.
- Minimum
- This option causes the score of the member with the lowest score to be used as the score of the composite node.
- Average
- This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.
Path length
Here, you can enter a single value k to specify the maximal path length, e.g.:25
Scoring
You can specify whether you want to find up- or down regulated paths up to a length k.
Node Mapping
Furthermore, you can specify how the scores for composite nodes (families and complexes) are computed:- Maximum
- This option causes the score of the member with the highest score to be used as the score of the composite node.
- Minimum
- This option causes the score of the member with the lowest score to be used as the score of the composite node.
- Average
- This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.
If you want to view your results in a different viewer or utilize it for further analysis, you can download them as a raw package.
The subgraph result directory contains the following files:
- The resulting subnetwork for k = YY
- The NCBI gene symbols for all genes
- The scores used in the ILP computation
- The signed scores. This is useful to distinguish between up and down regulation.
- The identifiers of the Database "SomeDBName" that are mapped on the network nodes.