Early Prediction of Stroke Risk Using Machine Learning Approaches and Imbalanced Data
DOI:
https://doi.org/10.56286/1vf19469Keywords:
Decision Tree, Imbalanced Data, KNN, LDA, Naïve Bayes, Machine Learning ModelsAbstract
Classifying medical datasets using machine learning algorithms could help physicians to provide accurate diagnosing and suitable treatment. For instance, stroke is one of the serious diseases that attacks many patients annually, and analyzing it is symptoms in advance could save patients’ lives. The warning signs of the stroke can be investigated to be used as attributes or predictors for machine learning models. This study evaluates the performance of four machine learning models to classify stroke datasets. Specifically, Decision Tree, Naïve Bayes, K- Nearest Neighbor (KNN) and Linear discriminant Analyses (LDA) models were trained on 11 attributes collected from 5110 patients to predict stroke risk. The findings showed that KNN outperformed the three other models with an achieved accuracy of 90%. The study also considered balancing the employed data prior validating the models to provide accurate classification. Cross-validation technique was used to avoid over-fitting and under-fitting during training phases.
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 NTU Journal of Engineering and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.