Show simple item record

dc.contributor.authorObunge, Duncan Ogindo
dc.date.accessioned2026-02-12T09:04:48Z
dc.date.available2026-02-12T09:04:48Z
dc.date.issued2025-09
dc.identifier.urihttp://repository.kemu.ac.ke/handle/123456789/2153
dc.description.abstractThe increasing prevalence of Diabetes mellitus (DM), a leading global health challenge, is significantly impacting the healthcare systems. Accurate and interpretable classification models are crucial in advancing early diagnosis and effective intervention. While traditional machine learning techniques like Extreme Gradient boosting (XGBoost) and RandomForest based models have demonstrated robust classification performance on tabular medical datasets, however, they have continued to face challenge of model interpretability. Deep learning models, like TabNet, cater the two-pronged benefits of feature selection learning and interpretability via attention mechanisms. This study developed a weighted ensemble model that combines TabNet, XGBoost and RandomForest based models to address the trade-off between interpretability and strong performance. The study utilized the Pima Indian Diabetes dataset as secondary data and expert clinical validation. The dataset, contained 768 tuples with 8 features, related to diabetes risk factors. The ensemble assigns optimized weights to the classifications of the three models, drawing on their complementary strengths. The results indicated that the weighted ensemble model outperformed the individual models; while preserving interpretability. The implementation achieved a balanced accuracy of 0.8630 ± 0.0146 (median 0.8350), precision of 0.8163 ± 0.0442 (median 0.8018), recall of 0.9376 ± 0.0341 (median 0.8900), F1 score of 0.8401 ± 0.0110 (median 0.8436), and ROC-AUC score of 0.9026 ± 0.0172 (median 0.9044), while the traditional machine learning models based on XGBoost attained (0.8103 ± 0.0270 (0.8150) balanced accuracy, 0.7888 ± 0.0287 (0.7890) precision) and RandomForest achieved (0.8060 ± 0.0250 (0.8100) balanced accuracy, 0.7451 ± 0.0312 (0.7768) precision) algorithms. Feature importance analysis revealed the top most significant predictors of diabetes based on normalized scores as; glucose level (≈1), followed by age (≈0.458), insulin (≈0.434) and body mass index(BMI) (≈0.13) hence providing valuable clinical insights. This research contributes a novel computational framework that leverages a weighted ensemble learning techniques while preserving model explainability; a critical advancement for healthcare-aligned machine learning systems. This methodological contribution extends beyond diabetes classification to potentially benefit various clinical decision support systems operating on limited-feature medical datasets.en_US
dc.language.isoenen_US
dc.publisherKeMUen_US
dc.subjectTabNet,en_US
dc.subjectDiabetes Classificationen_US
dc.subjectWeighted Ensemble Learningen_US
dc.titleEnhancing Diabetes Classification Using A Weighted Ensemble Of Tabnet, Xgboost And Random Foresten_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record