• Login
    View Item 
    •   KeMU Digital Repository Home
    • Masters Theses and Dissertations
    • School of Science and Technology
    • Master of Science in Computer Information Systems
    • View Item
    •   KeMU Digital Repository Home
    • Masters Theses and Dissertations
    • School of Science and Technology
    • Master of Science in Computer Information Systems
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Enhancing Diabetes Classification Using A Weighted Ensemble Of Tabnet, Xgboost And Random Forest

    Thumbnail
    View/Open
    Full text (3.015Mb)
    Date
    2025-09
    Author
    Obunge, Duncan Ogindo
    Type
    Thesis
    Language
    en
    Metadata
    Show full item record

    Abstract
    The increasing prevalence of Diabetes mellitus (DM), a leading global health challenge, is significantly impacting the healthcare systems. Accurate and interpretable classification models are crucial in advancing early diagnosis and effective intervention. While traditional machine learning techniques like Extreme Gradient boosting (XGBoost) and RandomForest based models have demonstrated robust classification performance on tabular medical datasets, however, they have continued to face challenge of model interpretability. Deep learning models, like TabNet, cater the two-pronged benefits of feature selection learning and interpretability via attention mechanisms. This study developed a weighted ensemble model that combines TabNet, XGBoost and RandomForest based models to address the trade-off between interpretability and strong performance. The study utilized the Pima Indian Diabetes dataset as secondary data and expert clinical validation. The dataset, contained 768 tuples with 8 features, related to diabetes risk factors. The ensemble assigns optimized weights to the classifications of the three models, drawing on their complementary strengths. The results indicated that the weighted ensemble model outperformed the individual models; while preserving interpretability. The implementation achieved a balanced accuracy of 0.8630 ± 0.0146 (median 0.8350), precision of 0.8163 ± 0.0442 (median 0.8018), recall of 0.9376 ± 0.0341 (median 0.8900), F1 score of 0.8401 ± 0.0110 (median 0.8436), and ROC-AUC score of 0.9026 ± 0.0172 (median 0.9044), while the traditional machine learning models based on XGBoost attained (0.8103 ± 0.0270 (0.8150) balanced accuracy, 0.7888 ± 0.0287 (0.7890) precision) and RandomForest achieved (0.8060 ± 0.0250 (0.8100) balanced accuracy, 0.7451 ± 0.0312 (0.7768) precision) algorithms. Feature importance analysis revealed the top most significant predictors of diabetes based on normalized scores as; glucose level (≈1), followed by age (≈0.458), insulin (≈0.434) and body mass index(BMI) (≈0.13) hence providing valuable clinical insights. This research contributes a novel computational framework that leverages a weighted ensemble learning techniques while preserving model explainability; a critical advancement for healthcare-aligned machine learning systems. This methodological contribution extends beyond diabetes classification to potentially benefit various clinical decision support systems operating on limited-feature medical datasets.
    URI
    http://repository.kemu.ac.ke/handle/123456789/2153
    Publisher
    KeMU
    Subject
    TabNet,
    Diabetes Classification
    Weighted Ensemble Learning
    Collections
    • Master of Science in Computer Information Systems [23]

    Copyright © 2019  | Kenya Methodist University (KeMU) Library
    Deposit Agreement Form
    | Privacy and Cookies | Send Feedback
     

    Browse

    All of KeMU Digital RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright © 2019  | Kenya Methodist University (KeMU) Library
    Deposit Agreement Form
    | Privacy and Cookies | Send Feedback