Bayesian Inference for Single-cell Clustering and Imputing

Elham Azizi, Sandhya Prabhakaran, Ambrose Carr, Dana Pe'er

GCB, Vol. 3, No. 1 (2017): e46


Abstract


Single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is noise-prone due to experimental errors and cell type-specific biases. Current computational approaches for analyzing single-cell data involve a global normalization step which introduces incorrect biases and spurious noise and does not resolve missing data (dropouts). This can lead to misleading conclusions in downstream analyses. Moreover, a single normalization removes important cell type-specific information. We propose a data-driven model, BISCUIT, that iteratively normalizes and clusters cells, thereby separating noise from interesting biological signals. BISCUIT is a Bayesian probabilistic model that learns cell-specific parameters to intelligently drive normalization. This approach displays superior performance to global normalization followed by clustering in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

Keywords


single-cell; RNA-seq; clustering; normalization; imputing; dropouts

Full Text:

PDF


DOI: http://dx.doi.org/10.18547/gcb.2017.vol3.iss1.e46

Copyright (c) 2017 Elham Azizi, Sandhya Prabhakaran, Ambrose Carr, Dana Pe'er

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.