Gene Expression Profile classification and Meaningful Discovery using Non-Negative Matrix Factorization

Maqsood Hayat, Nadeem Iqbal, Mohammad Sohail, Muhammad Noman Hayat


Accurate Classification of cancer types or subtypes is a great importance for better treatment and prognosis. With advent of microarray technology, which can simultaneously monitor the expression of all genes in the genome, can be used to diagnosis and classify cancer types in a systematic and objective fashion.  In this paper we propose a model for classification of microarray data through Non-Negative Matrix Factorization (NMF).The NMF is explored as feature extraction and K- Nearest Neighbor (KNN) is utilized as classification algorithm to classify the cancer disease for microarray dataset. The NMF approach is based on decomposition by parts one of them is the ‘encoding matrix’ contains the sample information that helps to classify the samples according to the similar gene expression labels and other factor ‘basis matrix’ that contains the gene expression label that helps to reduced the dimension and discovery of meaningful genes for a certain disease. Two benchmark datasets, Leukemia and Colon datasets are used to evaluate the proposed model. The proposed model achieved quite promising accuracy 97% on Leukemia dataset and 91% on Colon dataset. In addition, NMF also used to exhibits the meaningful genes that useful for the classification of the cancer related microarray data.


R. Burbidge, M.Trotter, B. Buxton and Sl Holden, Drug design by machine learning : Support vector machines for pharmaceutical data analysis , Computers and Chemistry 26: 5-14 , 2001

M. Molla, M. Waddell, D. Page, J.and Shavlik, Using Machine Learning to Design and Interpret Gene-Expression Microarrays. AI Magazine 25:23-44, 2004.

Z. Wang, Y. Wang, J. Lu, S. Kung, J. Zhang, R. Lee, J. Xuan, at al., Discriminatory Mining of Gene Expression Microarray Data. The Journal of VLSI Signal Processing 35:255-272, 2003.

W. Dubitzky, M. Granzow, and D. Berrar, Data Mining and Machine Learning Methods for Microarray Analysis. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis - Papers from CAMDA2000, Boston. Kluwer, Academic Publishers, 2001.

P. S. Bradley and O. L. Mangasarian, “Feature Selection via Concave Minimization and Support VectorMachines”. In Machine Learning Proceedings of the Fifteenth International Conference(ICML '98), J.Shavlik, editor, Morgan Kaufmann, San Francisco, California, 82-90, 1998.

S. Cho, and H. Won, Machine Learning in DNA Microarray Analysis for Cancer Classification. APBC2003:189-198, 2003.

Haifeng Li, Keshu Zhang, Tao jiang, Robust and accurate cancer classification with gene expression profiling, proceeding of the 2005 IEEE CSB’05.

ulhane, between-group analysis of microarray data, Bioinformatics, Vol.18 No.12, 2002

Jian J. Dai, Linh Lieu, and David Rocke (2006) "Dimension Reduction for Classification with Gene Expression Microarray Data," Statistical Applications in Genetics and Molecular Biology: Vol. 5: Iss. 1, Article 6.

Lee,D.D & Seung,H.S (1999) Nature 401, 788-793

Brunet,J.p Tamayo,p.k Golun,T.R.,and Mesirove,J.P., “ Metagenes and molecular pattern discovery using matrix factorization , Proc Natl Acad Sci USA, 101 (12); 4164-4169

Alon,U.,Barkai, N.,Notterman,D.,Gish,K.,Ybarra, S.,Mack, D.&Levine, A.J.(1999),’Broad paterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays’,Proc.Nat.Acad.Sci. USA 96 , 6745-6750

Golub, T.,Slonim, D.,Tamayo,P.,Huard, C.,M.Caasenbeek, J.M.,Coller, et al. (1999) , ‘Molecular classification of cancer: class discovery and class prediction by gene expression monitoring’, Science 286,531-537.

Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W.,Schummer, M. and Haussler, D. (2000): Support vector machine classification and validation of cancer tissues amples using microarray expression data.Bioinformatics, 16(10):906-914. gene expression data. Bioinformatics, 18(1):39-50.

Li, W. and Yang, Y. (2000): How many genes are needed for a discriminant microarray data analysis. Critical Assessment of Techniques for Microarray Data Mining Workshop.

Li, L., Weinberg, C. R., Darden, T. A. and Pedersen, L. G. (2001): Gene selection for sample classification based. on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 17(12):1131-1142.

Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, N. (2000): Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559-584.

Nguyen, D. V. and Rocke, D. M. (2002): Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18(1):39-50.

Full Text: PDF [Full Text]


  • There are currently no refbacks.

Copyright © 2012, All rights reserved.|

Creative Commons License
International Journal of Research in Computer and Communication Technology Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJRCCT , Permissions beyond the scope of this license may be available at