CYTOLOGICAL FEATURE-BASED MACHINE LEARNING CLASSIFICATION OF BENIGN AND MALIGNANT BREAST TUMORS USING THE WISCONSIN DIAGNOSTIC BREAST CANCER
Abstract
Breast cancer is still a serious worldwide health issue, and prompt diagnosis and treatment planning depend on the ability to distinguish between benign and malignant tumors. This study aimed to classify benign and malignant breast tumors using cytological through supervised machine learning techniques. The dataset contained 569 tumor records and 30 numerical cytological features derived from breast tumor cell nuclei. After removing non-informative columns, the diagnosis variable was encoded into benign and malignant classes, and the numerical features were standardized. Six
ML models were applied: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, and Naive Bayes. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROCAUC. The descriptive results showed that malignant tumors had higher mean values for radius, perimeter, area, concavity, and concave points, indicating greater nuclear size and structural irregularity. The classification results showed strong performance across all models. Random Forest and Support Vector Machine achieved the highest accuracy of 0.9737 and F1-score of 0.9630, while LR achieved the highest ROC-AUC value of 0.9960. These findings indicate that cytological
features provide strong diagnostic separation between benign and malignant tumors. The study concludes that ML models can serve as effective decision-support tools for breast tumor classification, although external validation is required before clinical application.
Downloads
References
Agarap, A. F. M. (2018). On breast cancer detection: An application of machine learning algorithms on the wisconsin
diagnostic dataset. Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, 5–9.
https://doi.org/10.1145/3184066.3184080
Al-Antari, M. A. (2023). Artificial intelligence for medical diagnostics—Existing and future AI technology!
Diagnostics, 13(4), 688.
Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., AlAmidie, M., & Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications,
future directions. Journal of Big Data, 8(1), 53. https://doi.org/10.1186/s40537-021-00444-8
Arnold, M., Morgan, E., Rumgay, H., Mafra, A., Singh, D., Laversanne, M., Vignat, J., Gralow, J. R., Cardoso, F.,
Siesling, S., & Soerjomataram, I. (2022). Current and future burden of breast cancer: Global statistics for 2020 and
The Breast, 66, 15–23. https://doi.org/10.1016/j.breast.2022.08.010
Barzaman, K., Karami, J., Zarei, Z., Hosseinzadeh, A., Kazemi, M. H., Moradi-Kalbolandi, S., Safari, E., &
Farahmand, L. (2020). Breast cancer: Biology, biomarkers, and treatments. International Immunopharmacology, 84,
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018:
GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal
for Clinicians, 68(6), 394–424. https://doi.org/10.3322/caac.21492
Cruz-Roa, A., Gilmore, H., Basavanhally, A., Feldman, M., Ganesan, S., Shih, N. N., Tomaszewski, J., González, F.
A., & Madabhushi, A. (2017). Accurate and reproducible invasive breast cancer detection in whole-slide images: A
Deep Learning approach for quantifying tumor extent. Scientific Reports, 7(1), 46450.
Ehteshami Bejnordi, B., Veta, M., Johannes van Diest, P., Van Ginneken, B., Karssemeijer, N., Litjens, G., Van Der
Laak, J. A., consortium, C., Hermsen, M., & Manson, Q. F. (2017). Diagnostic assessment of deep learning algorithms
for detection of lymph node metastases in women with breast cancer. Jama, 318(22), 2199–2210.
Ferlay, J., Colombet, M., Soerjomataram, I., Mathers, C., Parkin, D. M., Piñeros, M., Znaor, A., & Bray, F. (2019).
Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. International
Journal of Cancer, 144(8), 1941–1953. https://doi.org/10.1002/ijc.31937
Giaquinto, A. N., Sung, H., Miller, K. D., Kramer, J. L., Newman, L. A., Minihan, A., Jemal, A., & Siegel, R. L.
(2022). Breast Cancer Statistics, 2022. CA: A Cancer Journal for Clinicians, 72(6), 524–541.
https://doi.org/10.3322/caac.21754
Gunning, D., & Aha, D. (2019). DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, 40(2), 44–
Heer, E., Harper, A., Escandor, N., Sung, H., McCormack, V., & Fidler-Benaoudia, M. M. (2020). Global burden and
trends in premenopausal and postmenopausal breast cancer: A population-based study. The Lancet Global Health,
(8), e1027–e1037.
Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering
clinical impact with artificial intelligence. BMC Medicine, 17(1), 195. https://doi.org/10.1186/s12916-019-1426-2
Lei, S., Zheng, R., Zhang, S., Wang, S., Chen, R., Sun, K., Zeng, H., Zhou, J., & Wei, W. (2021). Global patterns of
breast cancer incidence and mortality: A population‐based cancer registry data analysis from 2000 to 2020. Cancer
Communications, 41(11), 1183–1194. https://doi.org/10.1002/cac2.12207
Liu, X., Rivera, S. C., Moher, D., Calvert, M. J., Denniston, A. K., Ashrafian, H., Beam, A. L., Chan, A.-W., Collins,
G. S., & Deeks, A. D. J. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial
intelligence: The CONSORT-AI extension. The Lancet Digital Health, 2(10), e537–e548.
Loibl, S., Poortmans, P., Morrow, M., Denkert, C., & Curigliano, G. (2021). Breast cancer. The Lancet, 397(10286),
–1769.
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., &
Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine
Intelligence, 2(1), 56–67.
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural
Information Processing Systems, 30.
https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado,
G. S., & Darzi, A. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788),
–94.
Momenimovahed, Z., & Salehiniya, H. (2019). Epidemiological characteristics of and risk factors for breast cancer in
the world. Breast Cancer: Targets and Therapy, Volume 11, 151–164. https://doi.org/10.2147/BCTT.S176070
Naji, M. A., El Filali, S., Aarika, K., Benlahmar, E. H., Ait Abdelouhahid, R., & Debauche, O. (2021). Machine
learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, 487–492.
Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. New England Journal of Medicine,
(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259
Reshan, M. S. A., Amin, S., Zeb, M. A., Sulaiman, A., Alshahrani, H., Azar, A. T., & Shaikh, A. (2023). Enhancing
breast cancer detection and classification using advanced multi-model features and ensemble machine learning
techniques. Life, 13(10), 2093.
Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J., Ashrafian, H., Beam, A. L., Collins, G. S., Darzi,
A., & Deeks, J. J. (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: The
SPIRIT-AI extension. The Lancet Digital Health, 2(10), e549–e560.
Rodriguez-Ruiz, A., Lång, K., Gubern-Merida, A., Broeders, M., Gennaro, G., Clauser, P., Helbich, T. H., Chevalier,
M., Tan, T., & Mertelmeier, T. (2019). Stand-alone artificial intelligence for breast cancer detection in mammography:
Comparison with 101 radiologists. JNCI: Journal of the National Cancer Institute, 111(9), 916–922.
Strelcenia, E., & Prakoonwit, S. (2023). Effective feature engineering and classification of breast cancer diagnosis: A
comparative study. BioMedInformatics, 3(3), 616–631.
UCI Machine Learning. (2016). Breast Cancer Wisconsin (Diagnostic) Data Set.
https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal For Research In Biology & Pharmacy

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In consideration of the journal, Green Publication taking action in reviewing and editing our manuscript, the authors undersigned hereby transfer, assign, or otherwise convey all copyright ownership to the Editorial Office of the Green Publication in the event that such work is published in the journal. Such conveyance covers any product that may derive from the published journal, whether print or electronic. Green Publication shall have the right to register copyright to the Article in its name as claimant, whether separately
or as part of the journal issue or other medium in which the Article is included.
By signing this Agreement, the author(s), and in the case of a Work Made For Hire, the employer, jointly and severally represent and warrant that the Article is original with the author(s) and does not infringe any copyright or violate any other right of any third parties, and that the Article has not been published elsewhere, and is not being considered for publication elsewhere in any form, except as provided herein. Each author’s signature should appear below. The signing author(s) (and, in



