## P-value computation

For determining the significance levels of the computed set-level scores, GeneTrail2 offers the gene set and the phenotype strategy.

### Gene set strategy

The gene set strategy is based on permuting the identifier-level scores. An advantage of this strategy is that it allows the direct computation of p-values for some methods and thus avoids costly permutation tests. This leads to a higher resolution of the computed p-values and very low computation times. For an in-depth discussion of the advantages and disadvantages of the respective methods we refer the reader to Tian et al. [1].

In the following sections, we discuss the different methods implemented to compute p-value based on this strategy.

#### Method 1 - Use the underlying distribution

The first way is to take advantage of the distribution a certain test statistic describes. If this distribution or the corresponding cumulative distribution function (CDF) are known or can be estimated, they can be used to obtain a p-value for the test statistic. The p-value for a test statistic can be calculated simply by applying the CDF to this value.

#### Set level-statistics that use this method

• One sample t-test
• Welch t-test
• Wilcoxon rank sum test

#### Method 2 - Permutation tests

In case the underlying distribution is not known a permutation test has to be used instead. In this test the labels of the observed data points are rearranged and the test statistic is recalculated to estimate its distribution under the null hypothesis [2]. The p-value is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data [3]. In order to obtain an exact p-value all possible permutations would have to be computed. However, in practical situations this is computationally expensive and often infeasible. For example, class labels that represent two classes with 50 samples each can be permuted in $100 ∝ 10^{29}$ different ways [3]. Therefore, p-values are often approximated by computing a limited number of random permutations [3].

Given a test statistic $t$ and a set $X$ that contains $N$ permutation values $\hat t_1, \hat t_2, ..., \hat t_N$ the one sided p-values are defined as [3]:

$$p_{upper}=\frac{1}{N}\sum_{i=1}^{N}I(\hat t_i \ge t)$$ $$p_{lower}=\frac{1}{N}\sum_{i=1}^{N}I(\hat t_i \le t)$$

Commonly, a pseudocount is introduced to avoid p-values of 0 [3]:

$$p_{upper}=\frac{1}{N}(1+\sum_{i=1}^{N}I(\hat t_i \ge t))$$ $$p_{lower}=\frac{1}{N}(1+\sum_{i=1}^{N}I(\hat t_i \le t))$$

#### Schematic description

1. Compute the test statistic $t$ for the original score list $L$.
2. Repeat the following steps $N$ times.
1. Generate a random permutation $\hat L$ of $L$.
2. Evaluate the test statistic on $\hat L$ yielding $\hat t$.
3. If $\hat t \ge t$ increase a counter $X$.
3. Output the p-value $X/N$.

Accordingly, a lower-tailed p-value can be calculated by replacing $\hat t \ge t$ with $\hat t \le t$.

#### Sevel-statistics that use this method

• All avaraging methods
• Weighted GSEA

#### Method 3 - Exact p-values

It is also to mention that methods have been proposed which do not need to calculate all permutations in order to compute an exact p-value. Keller et al. [4] propose such an algorithm for the unweighted version of the GSEA method.

#### Set level-statistics that use this method

• Unweighted GSEA

### Phenotype strategy

The phenotype strategy randomly redistributes the measurements between the sample and reference group. This strategy always requires that a permutation test is performed. As new identifier-level scores must be derived for every permutation, the method can only be used if a data matrix was supplied.

The difference between the phenotype and the gene set scheme is how the permuted score list is computed. The phenotype scheme permutes the group labels instead of the gene labels.

#### Schematic description

1. Compute the test statistic $t$ for the original score list $L$.
2. Repeat the following steps $N$ times.
1. Generate a random assignment to the sample and reference group.
2. Compute a new score list $\hat L$
3. Evaluate the test statistic on $\hat L$ yielding $\hat t$.
4. If $\hat t \ge t$ increase a counter $X$.
3. Output the p-value $X/N$.

Accordingly, a lower-tailed p-value can be calculated by replacing $\hat t \ge t$ with $\hat t \le t$.

### Bibliography

1. Tian, Lu and Greenberg, Steven A and Kong, Sek Won and Altschuler, Josiah and Kohane, Isaac S and Park, Peter J Discovering statistically significant pathways in expression profiling studies Proceedings of the National Academy of Sciences of the United States of America National Acad Sciences (View online)
2. Edgington, Eugene and Onghena, Patrick Randomization tests CRC Press
3. Knijnenburg, Theo A and Wessels, Lodewyk FA and Reinders, Marcel JT and Shmulevich, Ilya Fewer permutations, more accurate P-values Bioinformatics Oxford Univ Press (View online)
4. Keller, A. and Backes, C. and Lenhof, H. P. Computation of significance scores of unweighted Gene Set Enrichment Analyses BMC Bioinformatics (View online)