Tutorial for GeneTrail 2:

GeneTrail 2 offers you two main variants for data input. You either may choose to download preprocessed and normalized expression data from GEO or provide a precomputed list of scores.

Gene Expression Omnibus

The Gene Expression Omnibus (GEO) is a MIAME compliant online database for microarray experiments. Normalized data is stored in the GEO SOFT format, whereas unprocessed data is stored in a platform dependent raw format. Currently GeneTrail 2 supports the SOFT format for various platforms and organisms:

Supported Organisms
  • Homo Sapiens (9606)
  • Mus Musculus (10090)
  • Rattus Norvegicus (10116)
  • Arabidopsis thaliana (3702)
  • Danio rerio (7955)
  • Drosophila melanogaster (7227)
  • Caenorhabditis elegans (6239)
  • Anopheles gambiae (180454)
  • Bos taurus (9913)
  • Canis familiaris (9615)
  • Gallus gallus (9031)
  • Plasmodium falciparum 3D7 (36329)
  • Pan troglodytes (9598)
  • Sus scrofa (9823)

When using a record from GEO, GeneTrail 2 relies on the proper normalization of the stored data. If you want to normalize the data yourself, you will need to obtain and process the raw data from GEO and upload a score file.

The SOFT format is supported for GEO Datasets (GDS) and GEO Series (GSE). GeneTrail 2 requires you to select either one GSE record and distribute the contained samples into a test set and control set or select two GDS records that define your sample and reference set.

In case you choose a GSE file enter a valid GSE identifier (e.g., GSE14767). The corresponding GEO Series .soft file is then downloaded to the GeneTrail 2 server automatically. In a next step, you may specify the sample and the reference group.

In case you choose two GDS files enter valid GDS identifiers (e.g., GDS2161 and GDS2162) for the test and control group, respectively. The cor­re­spon­ding GEO Data Set .soft files are then downloaded to the GeneTrail 2 server automatically.

In case you choose a text file upload a plain text file containing identifier with or without pre-computed scores. The values have to be whitespace separated. (example)
If you have uploaded one GSE file containing both, data of the test group and the control group, the sample identifiers (GSMs) are displayed in the data pool.

You can then select arbitrary GSMs and move them either to the sample group or to the reference group. GeneTrail 2 also provides a link to inspect the GSE file on the NCBI webserver.

In this step a score for differential expression between the two groups is calculated.

If your test group consists of multiple samples you can choose from the following scoring schemes:

  • Independent Shrinkage t-Test
  • Independent Students t-Test
  • Wilcoxon Rank Sum Test
  • Signal to Noise Ratio
  • F-Test
  • Log-Mean-Fold-Quotient
  • Mean-Fold-Quotient

If however your test group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:

  • z-score
  • Log-Mean-Fold-Quotient
  • Mean-Fold-Quotient

In this step a score for differential expression between the two groups is calculated.

If your sample group consists of multiple samples you can choose from the following scoring schemes:

  • Independent Shrinkage t-Test
  • Independent Students t-Test
  • Wilcoxon-Mann-Whitney-Test
  • Signal to Noise Ratio
  • F-Test
  • Log-Mean-Fold-Quotient
  • Mean-Fold-Quotient
  • Mean-Fold-Difference

If your sample and reference groups have the same size the following test statistics can also be chosen:

  • Paired Student's t-Test
  • Wilcoxon Matched Pairs Signed Rank Test

If your sample and reference groups more than 15 samples:

  • Pearson Correlation
  • Spearman Correlation

If however your sample group consists only of a single sample (e.g. for diagnostic purposes) all test statistics are replaced by the z-score:

  • z-score
  • Log-Mean-Fold-Quotient
  • Mean-Fold-Quotient
  • Mean-Fold-Difference

GeneTrail 2 offers you several methods to analyse your data. Currently,the following algorithms are supported:
  • Enrichment algorithms:
    • Gene Set Enrichment Analysis (article)
    • Weighted Gene Set Enrichment Analysis (article)
    • Over Representation Analysis (article)
    • Wilcoxon Rank Sum Test (article)
    • One Sample t-Test (article)
    • Two Sample t-Test (article)
    • Mean/Median/Sum of single gene statistic (article)
    • Max-Mean statistic (article)
  • Algorithms to find deregulated subgraphs in regulatory networks:
Additionally, you can download your annotated scoring file.

Subgraph size

Here, you can either enter a single value or a range of values for the size of the subgraph:
single value
Single values, e.g., 25
single range
Separate values by a dash, e.g., 10-25
multiple ranges
Separate ranges by a semicolon, e.g., 1-12; 15-20; 25-30

Scoring

Scoring mode
You can specify whether you want to use positive and negative values or absolute values as scores.
Node Mapping
Furthermore, you can specify how the scores for com­pos­ite nodes (families and complexes) are computed:
Maximum
This option causes the score of the member with the highest score to be used as the score of the composite node.
Minimum
This option causes the score of the member with the lowest score to be used as the score of the composite node.
Average
This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.

Path length

Here, you can enter a single value k to specify the maximal path length, e.g.: 25

Scoring

You can specify whether you want to find up- or down regulated paths up to a length k.

Node Mapping
Furthermore, you can specify how the scores for com­pos­ite nodes (families and complexes) are computed:
Maximum
This option causes the score of the member with the highest score to be used as the score of the composite node.
Minimum
This option causes the score of the member with the lowest score to be used as the score of the composite node.
Average
This option computes the average score of all members of a composite node. Please note that if you use the absolute values option there are two ways how this score can be computed. Either absolute values are taken before computing the average value, or the absolute value of the computed average is used. You will be able to choose between these two options.

You can select all parameters that should be used for your analysis.

If you enter a valid email address, a link to the results of your analysis will be sent to the spec­i­fied address. If you decide not to enter an email address, please do not close your browser win­dow - otherwise the results of your analysis can­not be returned to you.

Pressing the cancel job-button will abort your analysis. If instances of your problem have successfully been computed, GeneTrail 2 will pack these and provide them for download. The provided link for (re-)downloading and view­ing your results will remain valid for two weeks.
Your computation has completed successfully and you can now download or view your results. For visualization GeneTrail 2 offers you to choose between a simple browser based solution using the Cytoscape JS library. Alternatively, you can choose BiNA, a graphical tool for network analysis, as your viewer.

If you want to view your results in a different viewer or utilize it for further analysis, you can download them as a raw package.

The subgraph result directory contains the following files:

  • The resulting subnetwork for k = YY
  • The NCBI gene symbols for all genes
  • The scores used in the ILP computation
  • The signed scores. This is useful to distinguish between up and down regulation.
  • The identifiers of the Database "SomeDBName" that are mapped on the network nodes.
The ids used in the SIF files (and the left-hand side of the NA files) are an arbitrary, unique identifier for each gene and bear no further meaning.

The Biological Network Analyzer (BiNA) is a workbench for visualizing and analyzing biological networks. Various biological networks can be displayed, edited and analyzed. Please visit http://www.bina.unipax.info for more information.

With Cytoscape JS you can inspect the networks immediately in your browser.