Principal components analysis: theory and application to gene expression data analysis

  • Hristo Todorov Fresenius Kabi Deutschland GmbH
  • David Fournier Johannes Gutenberg-Universität Mainz
  • Susanne Gerber Johannes Gutenberg-Universität Mainz

Abstract

Advances in computational power have enabled research to generate significant amounts of data related to complex biological problems. Consequently, applying appropriate data analysis techniques has become paramount to tackle this complexity. However, theoretical understanding of statistical methods is necessary to ensure that the correct method is used and that sound inferences are made based on the analysis. In this article, we elaborate on the theory behind principal components analysis (PCA), which has become a favoured multivariate statistical tool in the field of omics-data analysis. We discuss the necessary prerequisites and steps to produce statistically valid results and provide guidelines for interpreting the output. Using PCA on gene expression data from a mouse experiment, we demonstrate that the main distinctive pattern in the data is associated with the transgenic mouse line and is not related to the mouse gender. A weaker association of the pattern with the genotype was also identified.

Published
2018-01-30
How to Cite
TODOROV, Hristo; FOURNIER, David; GERBER, Susanne. Principal components analysis: theory and application to gene expression data analysis. Genomics and Computational Biology, [S.l.], v. 4, n. 2, p. e100041, jan. 2018. ISSN 2365-7154. Available at: <https://genomicscomputbiol.org/ojs3/GCB/article/view/41>. Date accessed: 17 sep. 2019. doi: https://doi.org/10.18547/gcb.2018.vol4.iss2.e100041.