# GeneTrail2 1.5

#### Statistical analysis of molecular signatures

#### Tutorial for GeneTrail 2:

GeneTrail 2 offers you two main variants for data input. You either may choose to download preprocessed and normalized expression data from GEO or provide a precomputed list of scores.

#### Gene Expression Omnibus

The Gene Expression Omnibus (GEO) is a
MIAME compliant online database for microarray experiments. Normalized data is stored
in the GEO SOFT format, whereas unprocessed data is stored in a platform dependent raw
format. Currently *GeneTrail 2* supports the SOFT format for various platforms
and organisms:

##### Supported Organisms

- Homo Sapiens (9606)
- Mus Musculus (10090)
- Rattus Norvegicus (10116)
- Arabidopsis thaliana (3702)
- Danio rerio (7955)
- Drosophila melanogaster (7227)
- Caenorhabditis elegans (6239)
- Anopheles gambiae (180454)
- Bos taurus (9913)
- Canis familiaris (9615)
- Gallus gallus (9031)
- Plasmodium falciparum 3D7 (36329)
- Pan troglodytes (9598)
- Sus scrofa (9823)

When using a record from GEO, *GeneTrail 2* relies on the proper normalization of the stored data.
If you want to normalize the data yourself, you will need to obtain and process the raw data from GEO and
upload a score file.

The SOFT format is supported for GEO Datasets (GDS) and GEO Series (GSE). *GeneTrail 2*
requires you to select either one GSE record and distribute the contained samples into a test set
and control set or select two GDS records that define your sample and reference set.

**a GSE file**enter a valid GSE identifier (e.g.,

*GSE14767*). The corresponding GEO Series .soft file is then downloaded to the

*GeneTrail 2*server automatically. In a next step, you may specify the sample and the reference group.

**two GDS files**enter valid GDS identifiers (e.g.,

*GDS2161*and

*GDS2162*) for the test and control group, respectively. The corresponding GEO Data Set .soft files are then downloaded to the

*GeneTrail 2*server automatically.

**a text file**upload a plain text file containing identifier with or without pre-computed scores. The values have to be whitespace separated. (example)

*sample group*or to the

*reference group*.

*GeneTrail 2*also provides a link to inspect the GSE file on the NCBI webserver.

In this step a score for differential expression between the two groups is calculated.

If your test group consists of multiple samples you can choose from the following scoring schemes:

- Independent Shrinkage t-Test
- Independent Students t-Test
- Wilcoxon Rank Sum Test
- Signal to Noise Ratio
- F-Test
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient

If however your test group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:

- z-score
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient

In this step a score for differential expression between the two groups is calculated.

If your sample group consists of multiple samples you can choose from the following scoring schemes:

- Independent Shrinkage t-Test
- Independent Students t-Test
- Wilcoxon-Mann-Whitney-Test
- Signal to Noise Ratio
- F-Test
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient
- Mean-Fold-Difference

If your sample and reference groups have the same size the following test statistics can also be chosen:

- Paired Student's t-Test
- Wilcoxon Matched Pairs Signed Rank Test

If your sample and reference groups more than 15 samples:

- Pearson Correlation
- Spearman Correlation

If however your sample group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:

- z-score
- Log-Mean-Fold-Quotient
- Mean-Fold-Quotient
- Mean-Fold-Difference

*GeneTrail 2*offers you several methods to analyse your data. Currently,the following algorithms are supported:

- Enrichment algorithms:
- Algorithms to find deregulated subgraphs in regulatory networks:

#### Subgraph size

Here, you can either enter a single value or a range of values for the size of the subgraph:- single value
- Single values, e.g.,
`25`

- single range
- Separate values by a dash, e.g.,
`10-25`

- multiple ranges
- Separate ranges by a semicolon, e.g.,
`1-12; 15-20; 25-30`

#### Scoring

##### Scoring mode

You can specify whether you want to use positive and negative values or absolute values as scores.##### Node Mapping

Furthermore, you can specify how the scores for*composite nodes*(families and complexes) are computed:

- Maximum
- This option causes the score of the member with the highest score to be used as the score of the composite node.
- Minimum
- This option causes the score of the member with the lowest score to be used as the score of the composite node.
- Average
- This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.

#### Path length

Here, you can enter a single value k to specify the maximal path length, e.g.:`25`

#### Scoring

You can specify whether you want to find up- or down regulated paths up to a length k.

##### Node Mapping

Furthermore, you can specify how the scores for*composite nodes*(families and complexes) are computed:

- Maximum
- This option causes the score of the member with the highest score to be used as the score of the composite node.
- Minimum
- This option causes the score of the member with the lowest score to be used as the score of the composite node.
- Average
- This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.

*GeneTrail 2*will pack these and provide them for download. The provided link for (re-)downloading and viewing your results will remain valid for two weeks.

If you want to view your results in a different viewer or utilize it for further analysis, you can download them as a raw package.

The subgraph result directory contains the following files:

- The resulting subnetwork for k = YY
- The NCBI gene symbols for all genes
- The scores used in the ILP computation
- The signed scores. This is useful to distinguish between up and down regulation.
- The identifiers of the Database "SomeDBName" that are mapped on the network nodes.