Shalini kaushik, Usha Chouhan, Ashok Dwivedi


Functional explication of unascertained proteins is a remarkable achievement in proteomics. Proteins subcellular localization serves as the key annotation. Many prediction techniques were developed emphasizing on an individual biological point or speculating a subset of all localizations. Emulating the protein localization that is studied pivotal is carried out by gathering all the necessary biological relevant information and addressing the necessity of improving the prediction accuracy. Proteins carry an obligatory role in a wide range of bioprocess such as catalysis of biochemical reaction, signal transduction and are requisite for cellular processes. They execute the associated functions could be analyzed by predicting their associated cellular locations. The colonization of the proteins could be scrutinized by considering the features of primary sequence of protein such as physiochemical and amino acid composition of the complete protein. The C-terminal and N-terminal physiochemical composition and other physicochemical properties of the primary sequence also contribute for the subcellular localization. In this paper, the computational technique, J48, best first decision tree, random forest are employed for the localization prediction has shown significant performance over several other techniques. The integrated latest database  are trained with obsolete data and three techniques were employed for studying the subcellular localization which documents the increase in the accuracy of the prediction, by 87.711 % with J48, 81.67% with random forest, and 88.125% with BF Tree based on the features discussed by comparing our techniques over others.


subcellular localization, classification tree, human proteins, physicochemical properties.

Full Text:



Jensen, L. J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., ... & Andersen, C. A. F. "Prediction of human protein function from post-translational modifications and localization features." Journal of molecular biology 319, no. 5 (2002): 1257-1265. DOI: 10.1016/S0022-2836(02)00379-0. PMID: 12079362.

Fink, J. L., Aturaliya, R. N., Davis, M. J., Zhang, F., Hanson, K., Teasdale, M. S., ... & Teasdale, R. D. "LOCATE: a mouse protein subcellular localization database." Nucleic Acids Research 34, no. suppl 1 (2006): D213-D217.

Sprenger, J., Lynn Fink, J., Karunaratne, S., Hanson, K., Hamilton, N. A., & Teasdale, R. D. "LOCATE: a mammalian protein subcellular localization database." Nucleic acids research 36, no. suppl_1 (2008): D230-D233. DOI: 10.1093/nar/gkm950.

Yu, N. Y., Wagner, J. R., Laird, M. R., Melli, G., Rey, S., Lo, R., ... & Brinkman, F. S. "PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes." Bioinformatics 26, no. 13 (2010): 1608-1615. DOI:10.1093/bioinformatics/btq249. PMID: 20472543.

Chou, K. C., & Shen, H. B. "Cell-PLoc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organisms." Natural Science 2, no. 10 (2010): 1090. DOI: 10.4236/ns.2010.210136

Chou, K. C., & Shen, H. B. "Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization." Biochemical and biophysical research communications 347, no. 1 (2006): 150-157. DOI: 10.1016/j.bbrc.2006.06.059. PMID: 16808903.

Guo, T., Hua, S., Ji, X., & Sun, Z. "DBSubLoc: database of protein subcellular localization." Nucleic acids research 32, no. suppl 1 (2004): D122-D124. DOI:10.1093/nar/gkh109. PMID: 14681374.

Rastogi, S., & Rost, B. "LocDB: experimental annotations of localization for Homo sapiens and Arabidopsis thaliana." Nucleic acids research 39, no. suppl 1 (2011): D230-D234. DOI:10.1093/nar/gkq927. PMID: 21071420.

Yu, C. S., Cheng, C. W., Su, W. C., Chang, K. C., Huang, S. W., Hwang, J. K., & Lu, C. H. "CELLO2GO: a web server for protein subCELlular LOcalization prediction with functional gene ontology annotation." PLoS One 9, no. 6 (2014): e99368. DOI:10.1371/journal.pone.0099368. PMID: 24911789.

Dubey, A., & Chouhan, U. "Subcellular localization of proteins." Archives of Applied Science Research 3, no. 6 (2011): 392-401.

Habib, T., Zhang, C., Yang, J. Y., Yang, M. Q., & Deng, Y. "Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition." BMC genomics 9, no. 1 (2008): S16. DOI:10.1186/1471-2164-9-S1-S16. PMID: 18366605.

Nair, R., & Rost, B. "Better prediction of sub‐cellular localization by combining evolutionary and structural information." Proteins: Structure, Function, and Bioinformatics 53, no. 4 (2003): 917-930. DOI: 10.1002/prot.10507

Nakai, K., & Kanehisa, M. "Expert system for predicting protein localization sites in gram‐negative bacteria." Proteins: Structure, Function, and Bioinformatics 11, no. 2 (1991): 95-110.

Saini, H., Raicar, G., Dehzangi, A., Lal, S., & Sharma, A. "Subcellular localization for Gram positive and Gram negative bacterial proteins using linear interpolation smoothing model." Journal of theoretical biology 386 (2015): 25-33. DOI:10.1016/j.jtbi.2015.08.020. PMID: 26386142

Bhasin, M., & Raghava, G. P. S. "ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST." Nucleic acids research 32, no. suppl 2 (2004): W414-W419. DOI: 10.1093/nar/gkh350. PMID: 15215421.

Yu, C. S., Lin, C. J., & Hwang, J. K. "Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions." Protein Science 13, no. 5 (2004): 1402-1406. DOI: 10.1110/ps.03479604. PMID: 15096640.

Hua, S., & Sun, Z. "Support vector machine approach for protein subcellular localization prediction." Bioinformatics 17, no. 8 (2001): 721-728. DOI: PMID: 11524373.

Höglund, A., Dönnes, P., Blum, T., Adolph, H. W., & Kohlbacher, O. "MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition." (2006): 1158-1165. DOI: 10.1093/bioinformatics/btl002. PMID: 16428265.

Shatkay, H., Höglund, A., Brady, S., Blum, T., Dönnes, P., & Kohlbacher, O. "SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data." Bioinformatics 23, no. 11 (2007): 1410-1417. DOI: 10.1093/bioinformatics/btm115. PMID: 17392328.

Brady, S. "Text-based prediction of protein subcellular location." PhD diss., Queen's University, 2007.

Sarda, D., Chua, G. H., Li, K. B., & Krishnan, A. "pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties." Bmc Bioinformatics 6, no. 1 (2005): 152. DOI: 10.1186/1471-2105-6-152. PMID: 15963230.

Su, E. C. Y., Chiu, H. S., Lo, A., Hwang, J. K., Sung, T. Y., & Hsu, W. L. "Protein subcellular localization prediction based on compartment-specific features and structure conservation." BMC bioinformatics 8, no. 1 (2007): 330. DOI: 10.1186/1471-2105-8-330.

Wan, S., Mak, M. W., & Kung, S. Y. "mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor." Journal of theoretical biology 382 (2015): 223-234. DOI:10.1016/j.jtbi.2015.06.042. PMID: 26164062.

Zhou, H., Yang, Y., & Shen, H. B. "Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features." Bioinformatics 33, no. 6 (2017): 843-853. DOI: 10.1093/bioinformatics/btw723. PMID: 27993784

Wang, X., Li, H., Zhang, Q., & Wang, R. "Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier." BioMed research international 2016 (2016). DOI: 10.1155/2016/1793272. PMID: 27213149.

Tachibana, K., Gotoh, E., Kawamata, N., Ishimoto, K., Uchihara, Y., Iwanari, H., ... & Sakai, J. "Analysis of the subcellular localization of the human histone methyltransferase SETDB1." Biochemical and biophysical research communications 465, no. 4 (2015): 725-731. DOI: 10.1016/j.bbrc.2015.08.065. PMID: 26296461.

Wang, C., Hu, L., Guo, M., Liu, X., & Zou, Q. "imDC: an ensemble learning method for imbalanced classification with miRNA data." Genetics and Molecular Research 14, no. 1 (2015): 123-133. DOI: 10.4238/2015.January.15.15. PMID: 25729943.

Chou, K. C. "Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes." Bioinformatics 21, no. 1 (2005): 10-19. DOI: 10.1093/bioinformatics/bth466. PMID: 15308540.

Pillai, L., Pant, B., Chauhan, U., & Pardasani, K. R. "SVM Model for Amino Acid Composition Based Prediction of Mycobacterium tuberculosis." J Comput Sci Syst Biol 4 (2011): 047-049. DOI:10.4172/jcsb.1000075.

Mer, A. S., & Andrade-Navarro, M. A. "A novel approach for protein subcellular location prediction using amino acid exposure." BMC bioinformatics 14, no. 1 (2013): 342. DOI: 10.1186/1471-2105-14-342.

Kyte, J., & Doolittle, R. F. "A simple method for displaying the hydropathic character of a protein." Journal of molecular biology 157, no. 1 (1982): 105-132. DOI: 10.1016/0022-2836(82)90515-0 PMID: 7108955.

Huang, D. S., Han, K., & Gromiha, M. (Eds.). Intelligent Computing in Bioinformatics: 10th International Conference, ICIC 2014, Taiyuan, China, August 3-6, 2014, Proceedings. Vol. 8590. Springer, 2014.

Bier, D. M. "The energy costs of protein metabolism: lean and mean on Uncle Sam's team." The Role of Protein and Amino Acids in Sustaining and Enhancing Performance (1999): 109-119.


  • There are currently no refbacks.

Copyright (c) 2017 Shalini kaushik

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.