Epithelial Mesenchymal Transition Network-Based Feature Engineering in Lung Adenocarcinoma Prognosis Prediction Using Multiple Omic Data

Borong Shao, Carlo Vittorio Cannistraci, Tim OF. Conrad

GCB, Vol. 3, No. 3 (2017): e57


Epithelial mesenchymal transition (EMT) process has been shown as highly relevant to cancer prognosis. However, although different biological network-based biomarker identification methods have been proposed to predict cancer prognosis, EMT network has not been directly used for this purpose. In this study, we constructed an EMT regulatory network consisting of 87 molecules and tried to select features that are useful for prognosis prediction in Lung Adenocarcinoma (LUAD). To incorporate multiple molecular profiles, we obtained four types of molecular data including mRNA-Seq, copy number alteration (CNA), DNA methylation, and miRNA-Seq data from The Cancer Genome Atlas. The data were mapped to the EMT network in three alternative ways: mRNA-Seq and miRNA-Seq, DNA methylation, and CNA and miRNA-Seq. Each mapping was employed to extract five different sets of features using discretization and network-based biomarker identification methods. Each feature set was then used to predict prognosis with SVM and logistic regression classifiers. We measured the prediction accuracy with AUC and AUPR values using 10 times 10-fold cross validation. For a more comprehensive evaluation, we also measured the prediction accuracies of clinical features, EMT plus clinical features, randomly picked 87 molecules from each data mapping, and using all molecules from each data type. Counter-intuitively, EMT features do not always outperform randomly selected features and the prediction accuracies of the five feature sets are mostly not significantly different. Clinical features are shown to give the highest prediction accuracies. In addition, the prediction accuracies of both EMT features and random features are comparable as using all features (more than 17,000) from each data type.


Epithelial mesenchymal transition; data integration; subnetwork; feature selection; prognosis prediction; Lung adenocarcinoma

Full Text:


DOI: http://dx.doi.org/10.18547/gcb.2017.vol3.iss3.e57

Copyright (c) 2017 Borong Shao, Carlo Vittorio Cannistraci, Tim Conrad

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.