Enhancing Diabetes Classification Using A Weighted Ensemble Of  Tabnet, Xgboost And Random Forest

Obunge, Duncan Ogindo

dc.contributor.author	Obunge, Duncan Ogindo
dc.date.accessioned	2026-02-12T09:04:48Z
dc.date.available	2026-02-12T09:04:48Z
dc.date.issued	2025-09
dc.identifier.uri	http://repository.kemu.ac.ke/handle/123456789/2153
dc.description.abstract	The increasing prevalence of Diabetes mellitus (DM), a leading global health challenge, is significantly impacting the healthcare systems. Accurate and interpretable classification models are crucial in advancing early diagnosis and effective intervention. While traditional machine learning techniques like Extreme Gradient boosting (XGBoost) and RandomForest based models have demonstrated robust classification performance on tabular medical datasets, however, they have continued to face challenge of model interpretability. Deep learning models, like TabNet, cater the two-pronged benefits of feature selection learning and interpretability via attention mechanisms. This study developed a weighted ensemble model that combines TabNet, XGBoost and RandomForest based models to address the trade-off between interpretability and strong performance. The study utilized the Pima Indian Diabetes dataset as secondary data and expert clinical validation. The dataset, contained 768 tuples with 8 features, related to diabetes risk factors. The ensemble assigns optimized weights to the classifications of the three models, drawing on their complementary strengths. The results indicated that the weighted ensemble model outperformed the individual models; while preserving interpretability. The implementation achieved a balanced accuracy of 0.8630 ± 0.0146 (median 0.8350), precision of 0.8163 ± 0.0442 (median 0.8018), recall of 0.9376 ± 0.0341 (median 0.8900), F1 score of 0.8401 ± 0.0110 (median 0.8436), and ROC-AUC score of 0.9026 ± 0.0172 (median 0.9044), while the traditional machine learning models based on XGBoost attained (0.8103 ± 0.0270 (0.8150) balanced accuracy, 0.7888 ± 0.0287 (0.7890) precision) and RandomForest achieved (0.8060 ± 0.0250 (0.8100) balanced accuracy, 0.7451 ± 0.0312 (0.7768) precision) algorithms. Feature importance analysis revealed the top most significant predictors of diabetes based on normalized scores as; glucose level (≈1), followed by age (≈0.458), insulin (≈0.434) and body mass index(BMI) (≈0.13) hence providing valuable clinical insights. This research contributes a novel computational framework that leverages a weighted ensemble learning techniques while preserving model explainability; a critical advancement for healthcare-aligned machine learning systems. This methodological contribution extends beyond diabetes classification to potentially benefit various clinical decision support systems operating on limited-feature medical datasets.	en_US
dc.language.iso	en	en_US
dc.publisher	KeMU	en_US
dc.subject	TabNet,	en_US
dc.subject	Diabetes Classification	en_US
dc.subject	Weighted Ensemble Learning	en_US
dc.title	Enhancing Diabetes Classification Using A Weighted Ensemble Of Tabnet, Xgboost And Random Forest	en_US
dc.type	Thesis	en_US

Files in this item

Name:: Duncan_Obunge_Final_thesis.pdf
Size:: 3.015Mb
Format:: PDF
Description:: Full text

View/Open

This item appears in the following Collection(s)

Master of Science in Computer Information Systems [23]

Show simple item record