In an enrichment analysis multiple categories are tested simultaneously. For each individual test the same significance threshold ($\alpha$) is used to judge if a category is significant. This means $\alpha$ is the probability to make a false positive prediction (Type-I-Error). Subsequently, each test has probability $\alpha$ to make a Type-I-Error. The problem with multiple testing is that this probability is accumulated.

For k tested hypotheses this probability can be defined as:

$$P(\text{at least one significant result}) = 1-(1-\alpha)^k$$

Generally, there are two measures that are used to correct raw p-values for this problem: the family-wise error rate (FWER) or the false discovery rate (FDR).

### Family-wise error rate (FWER)

The family-wise error rate (FWER) is the probability of making at least one false positive prediction, or Type-I-Error, among all the tested null hypotheses.

$$\text{FWER} := Pr(|FP| > 0)$$

### False discovery rate (FDR)

The false discovery rate (FDR) can be defined as the expected proportion of incorrectly rejected null hypothesis among all rejected ones multiplied by the probability of getting at least one significant result [1].

$$FDR=E\left(\frac{FP}{FP+TP} |\, (FP+TP) > 0\right)Pr((FP+TP) > 0)$$

In contrast to the FWER the FDR allows to tolerate a certain number of tests to be incorrectly discovered [2].

Storey et al. [3] propose an similar definition called positive false discovery rate (pFDR) that assumes that at least one test has a positive finding. Accordingly the authors argue that cases where no test is significant are not of importance.

$$pFDR=E\left(\frac{FP}{FP+TP} |\, (FP+TP) > 0\right)$$

### FDR or FWER?

Under the assumption that all null hypotheses are true the FDR and the FWER are equivalent, since $TP = (TP + FP)$ gives us [4]:

$$FDR = E\left(\frac{FP}{FP+TP}\right) = 1 \times Pr\left(\frac{FP}{FP+TP} = 1\right) = Pr((FP+TP) > 0)$$

If this is not the case FDR-controlling adjustments are less strict than adjustments controlling the family-wise error rate and thus have in general more statistical power [4], [5].

In the following sections methods are described that can be used to control the FWER and the FDR under certain assumptions. Methods should be chosen carefully in order to avoid unwanted complications.

### Familywise error rate controlling p-value adjustments

#### Bonferroni

The Bonferroni method [6], [7], [8] adjusts all p-values with the number of tested null hypotheses. This test is very conservative but controls the familywise error rate without any restrictions [4].

$$\tilde p_{i}\ =\ np_{i}$$

#### Sidak

The Sidak method [6], [9] is less strict than the corresponding Bonferroni adjustment but imposes some restrictions in order to achieve this [10]. This adjustment is only guaranteed to control the familywise error rate when all of the p-values are uniformly distributed and independent [11], [4]. If this is the case the Sidak method has more statistical power.

$$\tilde p_{i}\ =\ 1 - ( 1 - p_{i})^{n}$$

#### Step-down methods

Step-down methods were first introduced by Holm [12]. These procedures examine p-values in order, from smallest to largest. If, after correction, a p-value is smaller than its predecessor it obtains the value of its predecessor. This ensures the monotonicity in the order. The benefit of using step-down methods is that the tests are made more powerful (smaller adjusted p-values) while, in most cases, maintaining strong control of the familywise error rate [4].

##### Holm

The Holm adjustment [12] is a step-down approach for the Bonferroni method.

$$\tilde p_{i}\ =\ \begin{cases} np_{i} & \text{for } i=1\\ \max \left( \tilde p_{(i-1)}, (n - i +1) p_{i} \right) & \text{for }i=2,...,n \end{cases}$$
##### Holm-Sidak

The Holm-Sidak adjustment [12], [9] is a step-down approach for the Sidak method.

$$\tilde p_{i}\ =\begin{cases} 1-(1-p_{i})^{(n)} & \text{for } i=1\\ \max \left( \tilde p_{(i-1)}, 1-(1-p_{i})^{(n-i+1)} \right) & \text{for }i=2 ,...,n \end{cases}$$
##### Finner

The Finner method [13], [14] is a step-down approach for a slightly adapted Sidak method.

$$\tilde p_{i}\ =\ \begin{cases} np_{i} & \text{for } i=1\\ \max \left( \tilde p_{(i-1)}, 1-(1-p_{i})^{(\frac{n}{i})} \right) & \text{for }i=2 ,...,n \end{cases}$$

#### Step-up methods

In step-up methods p-values are examined in order, from largest to smallest. If, after correction, a p-value is bigger than its predecessor it obtains the value of its predecessor. This ensures the monotonicity in the order.

##### Hochberg

The Hochberg adjustment [15] is a step-up approach for the Bonferroni method. Hochberg showed that Holm's step-down adjustments also control the familywise error rate even when calculated in step-up fashion. Since p-values adjusted by Hochberg's method are always smaller than or equal to p-values adjusted by Holm's method, the Hochberg method is more powerful [4].

$$\tilde p_{i}\ =\ \begin{cases} p_{i} & \text{for } i=n\\ \min \left( \tilde p_{(i+1)}, (n-i+1)p_{i} \right) & \text{for }i=n-1,...,1 \end{cases}$$

### False discovery rate controlling p-value adjustments

#### Benjamini-Hochberg

The Benjamini-Hochberg method [16], [5] is a step-up approach to control the false discovery rate. It assumes all p-values to be independent.

$$\tilde p_{i}\ =\ \begin{cases} p_{i} & \text{for } i=n\\ \min \left( \tilde p_{(i-1)}, \frac{n}{i}p_{i} \right) & \text{for }i=n-1 ,...,1 \end{cases}$$

#### Benjamini-Yekutieli

The Benjamini-Yekutieli method is an extension of the Benjamini-Hochberg adjustment that can also be applied when p-values are dependent [17]. This method always controls the false discovery rate, but is thus quite conservative [4].

$$\gamma = \sum_{i=1}^{n} \frac{1}{i}$$ $$\tilde p_{i}\ =\ \begin{cases} \gamma p_{i} & \text{for } i=n\\ \min \left( \tilde p_{(i+1)}, \gamma \frac{n}{i}p_{i} \right) & \text{for }i=n-1 ,...,1 \end{cases}$$

### Bibliography

1. Storey, John D A direct approach to false discovery rates J. R. Statist.Soc. B (View online)
2. http://brainder.org/ False Discovery Rate: Corrected & Adjusted P-values (View online)
3. Storey, John D. The positive false discovery rate: A Bayesian interpretation and the q-value Annals of Statistics
4. SAS p-Value Adjustments - SAS/STAT(R) 9.22 User's Guide (View online)
5. Hochberg, Yosef and Benjamini, Yoav More powerful procedures for multiple significance testing Statistics in medicine Wiley Online Library
6. Abdi, Herve The Bonferonni and Sidak Corrections for Multiple Comparisons
7. Bonferroni, C. E. Il calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni
8. Bonferroni, C. E. Teoria statistica delle classi e calcolo delle probability. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze
9. Sidak, Zbynek Rectangular confidence regions for the means of multivariate normal distributions Journal of the American Statistical Association Taylor and Francis Group
10. Westfall, Peter H and Wolfinger, Russell D Multiple tests with discrete distributions The American Statistician Taylor and Francis Group
11. Holland, Burt S and Copenhaver, Margaret DiPonzio An improved sequentially rejective Bonferroni test procedure Biometrics JSTOR
12. Holm, Sture A simple sequentially rejective multiple test procedure Scandinavian journal of statistics JSTOR
13. Finner, H On a monotonicity problem in step-down multiple test procedures Journal of the American Statistical Association Taylor and Francis Group
14. Finner, Helmut Some new inequalities for the range distribution, with application to the determination of optimum significance levels of multiple range tests Journal of the American Statistical Association Taylor and Francis Group
15. Hochberg, Yosef A sharper Bonferroni procedure for multiple tests of significance Biometrika Biometrika Trust
16. Benjamini, Yoav and Hochberg, Yosef Controlling the false discovery rate: a practical and powerful approach to multiple testing Journal of the Royal Statistical Society. Series B (Methodological) JSTOR
17. Benjamini, Yoav and Yekutieli, Daniel The control of the false discovery rate in multiple testing under dependency Annals of statistics JSTOR