CYTOLOGICAL FEATURE-BASED MACHINE LEARNING CLASSIFICATION OF BENIGN AND MALIGNANT BREAST TUMORS USING THE WISCONSIN DIAGNOSTIC BREAST CANCER

Authors

  • Dr. Hannah E. Collins
  • Dr. Victor Ramirez Ortega
  • Dr. Mei-Ling Zhou
  • Dr. Stefan Müller
  • Dr. Rania El-Haddad

Abstract

Breast cancer is still a serious worldwide health issue, and prompt diagnosis and treatment planning depend on the ability to distinguish between benign and malignant tumors. This study aimed to classify benign and malignant breast tumors using cytological through supervised machine learning techniques. The dataset contained 569 tumor records and 30 numerical cytological features derived from breast tumor cell nuclei. After removing non-informative columns, the diagnosis variable was encoded into benign and malignant classes, and the numerical features were standardized. Six
ML models were applied: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, and Naive Bayes. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROCAUC. The descriptive results showed that malignant tumors had higher mean values for radius, perimeter, area, concavity, and concave points, indicating greater nuclear size and structural irregularity. The classification results showed strong performance across all models. Random Forest and Support Vector Machine achieved the highest accuracy of 0.9737 and F1-score of 0.9630, while LR achieved the highest ROC-AUC value of 0.9960. These findings indicate that cytological
features provide strong diagnostic separation between benign and malignant tumors. The study concludes that ML models can serve as effective decision-support tools for breast tumor classification, although external validation is required before clinical application.

Downloads

Download data is not yet available.

References

Agarap, A. F. M. (2018). On breast cancer detection: An application of machine learning algorithms on the wisconsin

diagnostic dataset. Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, 5–9.

https://doi.org/10.1145/3184066.3184080

Al-Antari, M. A. (2023). Artificial intelligence for medical diagnostics—Existing and future AI technology!

Diagnostics, 13(4), 688.

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., AlAmidie, M., & Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications,

future directions. Journal of Big Data, 8(1), 53. https://doi.org/10.1186/s40537-021-00444-8

Arnold, M., Morgan, E., Rumgay, H., Mafra, A., Singh, D., Laversanne, M., Vignat, J., Gralow, J. R., Cardoso, F.,

Siesling, S., & Soerjomataram, I. (2022). Current and future burden of breast cancer: Global statistics for 2020 and

The Breast, 66, 15–23. https://doi.org/10.1016/j.breast.2022.08.010

Barzaman, K., Karami, J., Zarei, Z., Hosseinzadeh, A., Kazemi, M. H., Moradi-Kalbolandi, S., Safari, E., &

Farahmand, L. (2020). Breast cancer: Biology, biomarkers, and treatments. International Immunopharmacology, 84,

Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018:

GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal

for Clinicians, 68(6), 394–424. https://doi.org/10.3322/caac.21492

Cruz-Roa, A., Gilmore, H., Basavanhally, A., Feldman, M., Ganesan, S., Shih, N. N., Tomaszewski, J., González, F.

A., & Madabhushi, A. (2017). Accurate and reproducible invasive breast cancer detection in whole-slide images: A

Deep Learning approach for quantifying tumor extent. Scientific Reports, 7(1), 46450.

Ehteshami Bejnordi, B., Veta, M., Johannes van Diest, P., Van Ginneken, B., Karssemeijer, N., Litjens, G., Van Der

Laak, J. A., consortium, C., Hermsen, M., & Manson, Q. F. (2017). Diagnostic assessment of deep learning algorithms

for detection of lymph node metastases in women with breast cancer. Jama, 318(22), 2199–2210.

Ferlay, J., Colombet, M., Soerjomataram, I., Mathers, C., Parkin, D. M., Piñeros, M., Znaor, A., & Bray, F. (2019).

Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. International

Journal of Cancer, 144(8), 1941–1953. https://doi.org/10.1002/ijc.31937

Giaquinto, A. N., Sung, H., Miller, K. D., Kramer, J. L., Newman, L. A., Minihan, A., Jemal, A., & Siegel, R. L.

(2022). Breast Cancer Statistics, 2022. CA: A Cancer Journal for Clinicians, 72(6), 524–541.

https://doi.org/10.3322/caac.21754

Gunning, D., & Aha, D. (2019). DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, 40(2), 44–

Heer, E., Harper, A., Escandor, N., Sung, H., McCormack, V., & Fidler-Benaoudia, M. M. (2020). Global burden and

trends in premenopausal and postmenopausal breast cancer: A population-based study. The Lancet Global Health,

(8), e1027–e1037.

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering

clinical impact with artificial intelligence. BMC Medicine, 17(1), 195. https://doi.org/10.1186/s12916-019-1426-2

Lei, S., Zheng, R., Zhang, S., Wang, S., Chen, R., Sun, K., Zeng, H., Zhou, J., & Wei, W. (2021). Global patterns of

breast cancer incidence and mortality: A population‐based cancer registry data analysis from 2000 to 2020. Cancer

Communications, 41(11), 1183–1194. https://doi.org/10.1002/cac2.12207

Liu, X., Rivera, S. C., Moher, D., Calvert, M. J., Denniston, A. K., Ashrafian, H., Beam, A. L., Chan, A.-W., Collins,

G. S., & Deeks, A. D. J. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial

intelligence: The CONSORT-AI extension. The Lancet Digital Health, 2(10), e537–e548.

Loibl, S., Poortmans, P., Morrow, M., Denkert, C., & Curigliano, G. (2021). Breast cancer. The Lancet, 397(10286),

–1769.

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., &

Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine

Intelligence, 2(1), 56–67.

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural

Information Processing Systems, 30.

https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado,

G. S., & Darzi, A. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788),

–94.

Momenimovahed, Z., & Salehiniya, H. (2019). Epidemiological characteristics of and risk factors for breast cancer in

the world. Breast Cancer: Targets and Therapy, Volume 11, 151–164. https://doi.org/10.2147/BCTT.S176070

Naji, M. A., El Filali, S., Aarika, K., Benlahmar, E. H., Ait Abdelouhahid, R., & Debauche, O. (2021). Machine

learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, 487–492.

Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. New England Journal of Medicine,

(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259

Reshan, M. S. A., Amin, S., Zeb, M. A., Sulaiman, A., Alshahrani, H., Azar, A. T., & Shaikh, A. (2023). Enhancing

breast cancer detection and classification using advanced multi-model features and ensemble machine learning

techniques. Life, 13(10), 2093.

Rivera, S. C., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J., Ashrafian, H., Beam, A. L., Collins, G. S., Darzi,

A., & Deeks, J. J. (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: The

SPIRIT-AI extension. The Lancet Digital Health, 2(10), e549–e560.

Rodriguez-Ruiz, A., Lång, K., Gubern-Merida, A., Broeders, M., Gennaro, G., Clauser, P., Helbich, T. H., Chevalier,

M., Tan, T., & Mertelmeier, T. (2019). Stand-alone artificial intelligence for breast cancer detection in mammography:

Comparison with 101 radiologists. JNCI: Journal of the National Cancer Institute, 111(9), 916–922.

Strelcenia, E., & Prakoonwit, S. (2023). Effective feature engineering and classification of breast cancer diagnosis: A

comparative study. BioMedInformatics, 3(3), 616–631.

UCI Machine Learning. (2016). Breast Cancer Wisconsin (Diagnostic) Data Set.

https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data

Downloads

Published

2025-03-27

How to Cite

E. Collins, D. H., Ramirez Ortega, D. V., Zhou, D. M.-L., Müller, D. S., & El-Haddad, D. R. (2025). CYTOLOGICAL FEATURE-BASED MACHINE LEARNING CLASSIFICATION OF BENIGN AND MALIGNANT BREAST TUMORS USING THE WISCONSIN DIAGNOSTIC BREAST CANCER. International Journal For Research In Biology & Pharmacy, 11(1), 22–30. Retrieved from https://bp.gpubjournal.com/index.php/bp/article/view/2504