Investigating the Precise Identification of Citrullination Sites with High- Performance Score Metrics Using a Powerful Computation Predicting Tool
- Авторы: Ahmed F.1, Podder A.1, Bulbul M.1, Hossain M.2, Hasan M.3, Sarkar M.4, Kim D.5
-
Учреждения:
- Department of Mathematics, Jashore University of Science and Technology
- Department of Electrical and Electronic Engineering, Jashore University of Science and Technology
- Department of Computer Science and Engineering, Jashore University of Science and Technology
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology
- Department of Computer Science & Engineering, Pohang University of Science and Technology (POSTECH)
- Выпуск: Том 27, № 9 (2024)
- Страницы: 1381-1393
- Раздел: Chemistry
- URL: https://kazanmedjournal.ru/1386-2073/article/view/645006
- DOI: https://doi.org/10.2174/1386207326666230912151932
- ID: 645006
Цитировать
Полный текст
Аннотация
Background:To elucidate the detailed mechanisms of citrullination at the molecular level and design drugs applicable to major human diseases, predicting protein citrullination sites (PCSs) is essential. Using experimental approaches to predict PCSs is time-consuming and costly. However, there is a limited scope of the current PCS predictors. In particular, most predictors are commonly used for PCS prediction and have limited performance scores.
Objective:This work aims to provide an improved sophisticated predictor of citrullination sites using a benchmark dataset in a machine learning platform.
Methods:This study presents a reliable citrullination site predictor based on a benchmark dataset containing a 1:1 ratio of positive and negative samples. We classified citrullination sites using the Composition of the K-Spaced Amino Acid Pairs (CKSAAP) and Support Vector Machine (SVM).
Results:We developed PCS predictors using integrated machine-learning methods that produced the highest average scores. Using 10-fold cross-validation on test datasets, the True Positive Rate (TPR) was 98.34%, the True Negative Rate (TNR) was 99.44%, the accuracy was 98.89%, the Mathew Correlation Coefficient (MCC) was 98.21%, the Area Under the ROC Curve (AUC) was 0.999, and the partial Area Under the ROC Curve (pAUC) was 0.1968.
Conclusion:According to overall performance, our developed predictor has a significantly higher implementation in comparison with the current tools on the same benchmark dataset. Moreover, it showed better performance metrics on both test and training datasets. Our developed predictor is promising and can be implemented as a complementary technique for identifying fast and precise citrullination sites.
Об авторах
Fee Ahmed
Department of Mathematics, Jashore University of Science and Technology
Автор, ответственный за переписку.
Email: info@benthamscience.net
Anamika Podder
Department of Mathematics, Jashore University of Science and Technology
Email: info@benthamscience.net
Md. Bulbul
Department of Mathematics, Jashore University of Science and Technology
Email: info@benthamscience.net
Md. Hossain
Department of Electrical and Electronic Engineering, Jashore University of Science and Technology
Email: info@benthamscience.net
Mahedi Hasan
Department of Computer Science and Engineering, Jashore University of Science and Technology
Email: info@benthamscience.net
Md. Sarkar
Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology
Email: info@benthamscience.net
Daijin Kim
Department of Computer Science & Engineering, Pohang University of Science and Technology (POSTECH)
Автор, ответственный за переписку.
Email: info@benthamscience.net
Список литературы
- Mann, M.; Jensen, O.N. Proteomic analysis of post-translational modifications. Nat. Biotechnol., 2003, 21(3), 255-261. doi: 10.1038/nbt0303-255 PMID: 12610572
- Xu, Y.; Chou, K.C. Recent progress in predicting posttranslational modification sites in proteins. Curr. Top. Med. Chem., 2015, 16(6), 591-603. doi: 10.2174/1568026615666150819110421 PMID: 26286211
- Wang, Y.C.; Peterson, S.E.; Loring, J.F. Protein post-translational modifications and regulation of pluripotency in human stem cells. Cell Res., 2014, 24(2), 143-160. doi: 10.1038/cr.2013.151 PMID: 24217768
- Huang, K.Y.; Lee, T.Y.; Kao, H.J.; Ma, C.T.; Lee, C.C.; Lin, T.H.; Chang, W.C.; Huang, H.D. dbPTM in 2019: Exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res., 2019, 47(D1), D298-D308. doi: 10.1093/nar/gky1074 PMID: 30418626
- Kaore, S.N.; Amane, H.S.; Kaore, N.M. Citrulline: Pharmacological perspectives and its role as an emerging biomarker in future. Fundam. Clin. Pharmacol., 2013, 27(1), 35-50. doi: 10.1111/j.1472-8206.2012.01059.x PMID: 23316808
- Lazarus, R.C.; Buonora, J.E.; Kamnaksh, A.; Flora, M.N.; Freedy, J.G.; Holstein, G.R.; Martinelli, G.P.; Jacobowitz, D.M.; Agoston, D.; Mueller, G.P. Citrullination following traumatic brain injury: A mechanism for ongoing pathology through protein modification. In: Protein Deimination in Human Health and Disease; Springer: Cham, 2017; pp. 275-291.
- Blom, N.; Sicheritz-Pontén, T.; Gupta, R.; Gammeltoft, S.; Brunak, S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics, 2004, 4(6), 1633-1649. doi: 10.1002/pmic.200300771 PMID: 15174133
- Guo, Q.; Bedford, M.T.; Fast, W. Discovery of peptidylarginine deiminase-4 substrates by protein array: Antagonistic citrullination and methylation of human ribosomal protein S2. Mol. Biosyst., 2011, 7(7), 2286-2295. doi: 10.1039/c1mb05089c PMID: 21584310
- GS Chirivi, R.; Van Rosmalen, J.W.G.; Jenniskens, G.J.; Pruijn, G.J.M.; Raats, J.M.H. Citrullination: A target for disease intervention in multiple sclerosis and other inflammatory diseases? J. Clin. Cell. Immunol., 2013, 4(3), 1-8. doi: 10.4172/2155-9899.1000146
- Yuzhalin, A.E. Citrullination in cancer. Cancer Res., 2019, 79(7), 1274-1284. doi: 10.1158/0008-5472.CAN-18-2797 PMID: 30894374
- György, B.; Tóth, E.; Tarcsa, E.; Falus, A.; Buzás, E.I. Citrullination: A posttranslational modification in health and disease. Int. J. Biochem. Cell Biol., 2006, 38(10), 1662-1677. doi: 10.1016/j.biocel.2006.03.008 PMID: 16730216
- Chumanevich, A.A.; Causey, C.P.; Knuckley, B.A.; Jones, J.E.; Poudyal, D.; Chumanevich, A.P.; Davis, T.; Matesic, L.E.; Thompson, P.R.; Hofseth, L.J. Suppression of colitis in mice by Cl-amidine: A novel peptidylarginine deiminase inhibitor. Am. J. Physiol. Gastrointest. Liver Physiol., 2011, 300(6), G929-G938. doi: 10.1152/ajpgi.00435.2010 PMID: 21415415
- Stensland, M.; Holm, A.; Kiehne, A.; Fleckenstein, B. Targeted analysis of protein citrullination using chemical modification and tandem mass spectrometry. Rapid Commun. Mass Spectrom., 2009, 23(17), 2754-2762. doi: 10.1002/rcm.4185 PMID: 19639564
- Senshu, T.; Akiyama, K.; Kan, S.; Asaga, H.; Ishigami, A.; Manabe, M. Detection of deiminated proteins in rat skin: Probing with a monospecific antibody after modification of citrulline residues. J. Invest. Dermatol., 1995, 105(2), 163-169. doi: 10.1111/1523-1747.ep12317070 PMID: 7543546
- Bicker, K.L.; Subramanian, V.; Chumanevich, A.A.; Hofseth, L.J.; Thompson, P.R. Seeing citrulline: Development of a phenylglyoxal-based probe to visualize protein citrullination. J. Am. Chem. Soc., 2012, 134(41), 17015-17018. doi: 10.1021/ja308871v PMID: 23030787
- Liu, M.; Liu, G. Prediction of citrullination sites on the basis of mRMR method and SNN. Comb. Chem. High Throughput Screen., 2020, 22(10), 705-715. doi: 10.2174/1386207322666191129113508 PMID: 31782357
- Zhang, Q.; Sun, X.; Feng, K.; Wang, S.; Zhang, Y.H.; Wang, S.; Lu, L.; Cai, Y.D. Predicting citrullination sites in protein sequences using mRMR method and random forest algorithm. Comb. Chem. High Throughput Screen., 2017, 20(2), 164-173. PMID: 28029071
- Ju, Z.; Wang, S.Y. Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chous general pseudo amino acid composition. Gene, 2018, 664, 78-83. doi: 10.1016/j.gene.2018.04.055 PMID: 29694908
- Hasan, M.M.; Zhou, Y.; Lu, X.; Li, J.; Song, J.; Zhang, Z. Computational identification of protein pupylation sites by using profile-based composition of k-spaced amino acid pairs. PLoS One, 2015, 10(6), e0129635. doi: 10.1371/journal.pone.0129635 PMID: 26080082
- Chen, K.; Kurgan, L.; Rahbari, M. Prediction of protein crystallization using collocation of amino acid pairs. Biochem. Biophys. Res. Commun., 2007, 355(3), 764-769. doi: 10.1016/j.bbrc.2007.02.040 PMID: 17316561
- Xu, Y.; Ding, J.; Wu, L.Y.; Chou, K.C. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One, 2013, 8(2), e55844. doi: 10.1371/journal.pone.0055844 PMID: 23409062
- Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 2020, 36(11), 3350-3356. doi: 10.1093/bioinformatics/btaa160 PMID: 32145017
- Hasan, M.M.; Khatun, M.S.; Mollah, M.N.H.; Yong, C.; Guo, D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int. J. Nanomedicine, 2017, 12, 6303-6315. doi: 10.2147/IJN.S140875 PMID: 28894368
- Chen, Z.; Chen, Y.Z.; Wang, X.F.; Wang, C.; Yan, R.X.; Zhang, Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One, 2011, 6(7), e22930. doi: 10.1371/journal.pone.0022930 PMID: 21829559
- Rey, D.; Neuhäuser, M. International encyclopedia of statistical science; Springer: Berlin, Heidelberg, 2011, pp. 1658-1659. doi: 10.1007/978-3-642-04898-2_616
- Duda, R.O.; Hart, P.E. Pattern classification and scene analysis; Wiley: New York, 1973.
- Franco-Lopez, H.; Ek, A.R.; Bauer, M.E. Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Remote Sens. Environ., 2001, 77(3), 251-274. doi: 10.1016/S0034-4257(01)00209-7
- Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy K-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern., 1985, 15(4), 580-585. doi: 10.1109/TSMC.1985.6313426
- Dudani, S.A. The distance-weighted k-nearest neighbor rule. IEEE Trans. Syst. Man Cybern., 1978, 8(4), 311-313. doi: 10.1109/TSMC.1978.4309958
- Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley, 2000.
- Fukunaga, K. Introduction to statistical pattern classification academic press; Elsevier: San Diego, California, USA, 1990.
- Hasan, M.M.; Khatun, M.S.; Kurata, H. Large-scale assessment of bioinformatics tools for lysine succinylation sites. Cells, 2019, 8(2), 95. doi: 10.3390/cells8020095 PMID: 30696115
- Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19(7), 711-720. doi: 10.1109/34.598228
- Swets, D.L.; Weng, J.J. Using discriminant eigenfeatures for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 1996, 18(8), 831-836. doi: 10.1109/34.531802
- Dudoit, S.; Fridlyand, J.; Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc., 2002, 97(457), 77-87. doi: 10.1198/016214502753479248
- Venables, W.N.; Ripley, B.D. Modern applied statistics with S-PLUS; Springer, 2013.
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn., 1995, 20(3), 273-297. doi: 10.1007/BF00994018
- Khatun, M.S.; Hasan, M.M.; Kurata, H. PreAIP: Computational prediction of anti-inflammatory peptides by integrating multiple complementary features. Front. Genet., 2019, 10, 129. doi: 10.3389/fgene.2019.00129 PMID: 30891059
- Chen, W.; Lv, H.; Nie, F.; Lin, H. i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics, 2019, 35(16), 2796-2800. doi: 10.1093/bioinformatics/btz015 PMID: 30624619
- Shen, X.J.; Mu, L.; Li, Z.; Wu, H.X.; Gou, J.P.; Chen, X. Large-scale support vector machine classification with redundant data reduction. Neurocomputing, 2016, 172, 189-197. doi: 10.1016/j.neucom.2014.10.102
- Murty, M.N.; Devi, V.S. Pattern Recognition: An Algorithmic Approach; Springer, 2011. doi: 10.1007/978-0-85729-495-1
- Ho, T.K. Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition., Montreal, QC, Canada14-16 Aug; 1995, pp. 278-282.
- Hasan, M.M.; Guo, D.; Kurata, H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. Mol. Biosyst., 2017, 13(12), 2545-2550. doi: 10.1039/C7MB00491E PMID: 28990628
- Hasan, M.M.; Khatun, M.S.; Kurata, H. A comprehensive review of in silico analysis for protein S-sulfenylation sites. Protein Pept. Lett., 2018, 25(9), 815-821. doi: 10.2174/0929866525666180905110619 PMID: 30182830
- Breiman, L. Random forests. Mach. Learn., 2001, 45(1), 5-32. doi: 10.1023/A:1010933404324
- Hasan, M.M.; Yang, S.; Zhou, Y.; Mollah, M.N.H. SuccinSite: A computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Mol. Biosyst., 2016, 12(3), 786-795. doi: 10.1039/C5MB00853K PMID: 26739209
- Breiman, L. Bagging predictors. Mach. Learn., 1996, 24(2), 123-140. doi: 10.1007/BF00058655
- Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. Jinko Chino Gakkaishi, 1999, 14, 1612.
- Gandhi, R. Boosting algorithms: AdaBoost, gradient boosting and XGBoost. 2018. Available from: https://hackernoon.com/boosting-algorithms-adaboost-gradientboosting-and-xgboost-f74991cad38c
- Mei, S.; Zhu, H. AdaBoost based multi-instance transfer learning for predicting proteome-wide interactions between Salmonella and human proteins. PLoS One, 2014, 9(10), e110488. doi: 10.1371/journal.pone.0110488 PMID: 25330226
- Agresti, A. An introduction to categorical data analysis; John Wiley & Sons, 2018.
- Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: Data mining, inference, and prediction; Springer, 2009. doi: 10.1007/978-0-387-84858-7
- Hilbe, J.M. Logistic regression models; CRC press, 2009. doi: 10.1201/9781420075779
- Curns, A.T.; Nizam, A. Student Solutions Manual for Kleinbaum, Kupper, Muller, and Nizams Applied Regression Analysis and Other Multivariable Methods; Duxbury Press, 1998.
- Tabaei, B.P.; Herman, W.H. A multivariate logistic regression equation to screen for diabetes: Development and validation. Diabetes Care, 2002, 25(11), 1999-2003. doi: 10.2337/diacare.25.11.1999 PMID: 12401746
- Vacic, V.; Iakoucheva, L.M.; Radivojac, P. Two Sample Logo: A graphical representation of the differences between two sets of sequence alignments. Bioinformatics, 2006, 22(12), 1536-1537. doi: 10.1093/bioinformatics/btl151 PMID: 16632492
- Ma, H.; Bandos, A.I.; Rockette, H.E.; Gur, D. On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat. Med., 2013, 32(20), 3449-3458. doi: 10.1002/sim.5777 PMID: 23508757
- Berrar, D. Cross-validation. In: Encyclopedia of Bioinformatics and Computational Biology; Elsevier, 2019; 1, pp. 542-545.
Дополнительные файлы
