Research studies

Using both Classical and deep Learning techniques to accurately predict disease

 

Prepared by the researche : Rukhsar Hatam Qadir1, Didar Abdalwafaa Rashid 2, Hevi Jawhar Hameed3 –  University of Sulaimani, College of Administration and Economics (1,2,3)

DAC Democratic Arabic Center GmbH

International Journal of Economic Studies : Thirty-fourth Issue – August 2025

A Periodical International Journal published by the “Democratic Arab Center” Germany – Berlin

Nationales ISSN-Zentrum für Deutschland
ISSN  2569-7366
International Journal of Economic Studies

:To download the pdf version of the research papers, please visit the following link

https://democraticac.de/wp-content/uploads/2025/08/%D8%A7%D9%84%D9%85%D8%AC%D9%84%D8%A9-%D8%A7%D9%84%D8%AF%D9%88%D9%84%D9%8A%D8%A9-%D9%84%D9%84%D8%AF%D8%B1%D8%A7%D8%B3%D8%A7%D8%AA-%D8%A7%D9%84%D8%A7%D9%82%D8%AA%D8%B5%D8%A7%D8%AF%D9%8A%D8%A9-%D8%A7%D9%84%D8%B9%D8%AF%D8%AF-%D8%A7%D9%84%D8%B1%D8%A7%D8%A8%D8%B9-%D9%88%D8%A7%D9%84%D8%AB%D9%84%D8%A7%D8%AB%D9%88%D9%86-%D8%A2%D8%A8-%E2%80%93-%D8%A3%D8%BA%D8%B3%D8%B7%D8%B3-2025.pdf

Abstract

This study applies the logistic regression model and a feed forward neural network (FNN) then compares the effectiveness of two models.The data comprises many features of health indicators, such as albumin, proteins, and more, which are used as input features to identify if a person has liver disease or not, using a well-known medical dataset called the Liver Patient Dataset. Logistic regression was selected because it is one of the simplest and most interpretable linear classification methods; it achieved (0.74) accuracy and 0.20 MSE, indicating that as a baseline approach it performed reasonably well. An FNN model was built with the ability to capture more complex nonlinear relationships within the data and achieved higher prediction accuracy (0.80) and lower MSE (0.17), outperforming logistic regression and demonstrating that it is possible to obtain better predictive power at the expense of losing transparency; therefore, neural networks can complement clinical decision support systems where both predictive performance and interpretability are desired.

1. Introduction

Predicting the onset of Liver Patient Dataset is crucial for preventative care and can lead to early intervention and modified treatment [Ganji, Usha, & Rajakumar, 2025]. In the area of Type 2 Liver Patient Dataset, this challenge is particularly evident as early detection of at-risk individuals could significantly impact outcomes[Rani et al., 2025]. Due to their interpretability, simplicity, and computing efficiency, traditional machine learning techniques like logistic regression have been used for medical predicting; they produce explicit probabilities and coefficients that are straightforward to comprehend in a clinical context[Dritsas & Trigka, 2023], Furthermore, these models frequently perform well with moderate dataset sizes, which is advantageous when working with healthcare data that may have a narrow scope.[Dritsas & Trigka, 2023]

 Deep learning models, particularly Feedforward Neural Networks (FNNs), can learn complex, non-linear patterns from data and frequently achieve higher  predictive accuracy than older methods when trained on structured datasets like clinical measures and patient history;they have gained popularity in the field of disease prediction. However, their drawbacks include higher computational interpretability tools[Karna, Khan, Rauniyar, & Shambharkar, 2024].

Few studies directly compare FNNs with logistic regression utilizing the same dataset under

Comparable circumstances, such as the popular Pima Indians Liver Patient Dataset or other publicly accessible datasets related to the Liver Patient Dataset, other publicly accessible datasets related to the Liver Patient Dataset,despite the advantages of both methodologies. Furthermore, little study has been done on whether merging deep learning and traditional learning methods could improve prediction accuracy. By systematically comparing the ability of logistic regression and FNN to predict the status of the liver patient dataset using a standard dataset, this study aims to bridge the gap.  It also explores hybrid paths for prediction and informs best practices in applying deep learning and classical learning techniques to the prediction of the Liver Patient Dataset and other chronic illness monitoring problems.

  1. Literature Review

To assess the risk of heart disease as a function of age, blood pressure,cholesterol, smoking behaviours, and other pertinent clinical parameters, this study created a logistic regression model. [Ennab, 2024]. In addition to providing a summary of cancer including and death statistics, this study also discusses the use of logistic regression in oncology to predict cancer recurrence, survival rates, and patient prognosis. These applications have greatly enhanced clinical decision-making.[Frasca, 2024]. This study used machine learning and logistic regression techniques on the Liver patient Dataset to forecast patient diagnosis and medical expenses. [Hua, 2025]. In order to better understand the risk variables that influence stroke recovery, this study used logistic regression to estimate stroke outcomes, such as survival and recovery rates, based on starting health status, comorbidities, and treatment regimens. [Naresh & Reddi, 2025]. For the purpose of early diagnosis and treatment planning , this study used logistic regression and data mining techniques to predict the beginning of chronic kidney disease (CKD) using a number of clinical characteristics, including blood creatinine levels and urine protein levels. [Miah et al., 2023].

3. Methodology

3.1 Dataset Description

The Indian Liver Patient Datasets (ILPD), a well-known source for benchmark datasets, was acquired from the UCI Machine Learning Repository for use in this investigation. It has (10) attributes and (583) instances.

  • Task: Classification (predicting the presence of liver disease).
  • Instances: 583 cases (416 with illness, 167 without)
  • Features: The class label (“Selector”) is included together with ten characteristics: age, gender, total (TB) bilirubin, direct bilirubin (DB), (Alkphos) alkaline phosphatase , GOT, SGPT, (TP) total proteins, (ALB) albumin, and (A/G Ratio) albumin/globulin ratio.
  • Missing Values: None; all features are complete and contain no missing data.

Table 1. An overview of the UCI Machine Learning Repository’s Indian Liver Patient Dataset (ILPD).

Dataset Task Instances Features Missing Values Class 1 (disease) Class 2

  (No disease)

ILPD Classification 583 10 biochemical & demographic + class label No 416 167

3.2 Traditional Methods of Machine Learning

A classic machine learning algorithm for predicting illness, logistic regression is easy to use, interpretable, and works well with structured medical datasets. It also generates probabilistic outputs and coefficients that doctors can use to determine the relative contributions of individual risk factors (like age, BMI, and blood glucose levels) to illness outcomes. [Beam & Kohane, 2018] Although it only needs a little amount of computer power and is trained on moderately significant datasets, this is a crucial factor in healthcare situations where data is scarce. [Rajkomar, Dean, & Kohane, 2019].

In order to reduce overfitting, the algorithm can also employ regularization techniques such L1 (lasso) and L2 (ridge) penalties, particularly when working with multicollinear or high-dimensional feature spaces [Suttaket & Kok, 2024], LR frequently offers equivalent performance while remaining computationally efficient and highly interpretable, despite the fact that deep learning models may perform marginally better in some situations, according to recent comparison studies [Choi et al., 2020], Consequently, examples of logistic regression for healthcare applications are provided. in Fig. 1

Fig. 1 displays the use of logistic regression in healthcare.

[Shickel, & Rashidi, 2019]

The logit function, which models the log odds of the probability of an outcome (like having an illness), serves as the foundation for the structure of the logistic regression model. The formula is expressed as follows:

Where:

  • The likelihood that an event (such a patient getting sick) will occur is denoted by (.
  • : is the model’s intercept.
  • : The coefficients for the independent variables
  • The log-odds, or logit, of the result is on the left side of the equation.

The sigmoid function can be used to determine the probability p after this linear combination has been calculated:

        where

The structure of logistic regression lends itself well to healthcare research because it can be interpreted in terms of how predictors have an impact on the probability of a disease or health event.[Hosmer & Sturdivant, 2013]

3.3 Methods of Deep Learning Employed

One of the most basic deep learning architectures, feedforward neural networks (FNN), also known as multilayer perceptrons, have been widely applied to structured healthcare datasets for disease prediction. By propagating data in a single direction from the input layer via one or more hidden layers to the output layer, FNNs are able to learn intricate nonlinear correlations between input qualities and desired outcomes. [Hosmer, Lemeshow, & Sturdivant, 2013]. An FNN is especially well-suited for datasets like Liver Patient datasets prediction, Where risk factors like age, BMI, and glucose levels may interact in non-additive ways, because of its capacity to automatically capture feature interaction. [Dahiya, 2024].

In order to add non-linearity and improve learning capacity, FNNs are usually trained using stochastic gradient descent or its variations in conjunction with activation functions like ReLU or sigmoid . Regularization techniques like L2 penalties and dropout are frequently used to reduce overfitting in moderately sized health care dataset [Ding et al., 2024]. When the connection between variables is highly non-linear, FNNs can produce predictions  with higher  accuracy, even though they frequently demand more computing power and tweaking than more conventional methods like logistic regression. [Schreuder, Bosman, Engelbrecht, & Cleghorn, 2023].

In order to compare traditional and deep learning approaches in terms of trade-offs between interpretability, computational complexity, and predictive accuracy in Liver patient datasets prediction, this study uses the FNN as both a representative deep Learning approach to structured medical data and a performance benchmark for Logistic Regression. Thus, Figure 2 illustrates a simple feed-forward neural network, Fig (2) an illustrative example of a feed-forward neural network

[Guetari & Azzouzi, 2023]

3.4 Evaluation Metrics

Several metrics were used to evaluate the Feedforward Neural Network (FNN) and Logistic Regression (LR) models: The confusion matrix provides a thorough breakdown of true positives and false positives for each class, allowing for the derivation of additional performance metrics like accuracy, recall, and F1-score. [Brilliant.org, 2025], Accuracy, which calculates the proportion of correctly categorized instances among all examples, is the most widely used metric for classification tasks. [Miller, 2024] The MSE is a useful metric for assessing probabilistic models, especially those that use logistic regression and neural networks, since it calculates the average squared difference between predicted probabilities and observed class labels. [Shahid, Zameer, & Muneer, 2022],these complimentary metrics make it possible to evaluate model performance in a thorough and multi-dimensional  manner.

 4.  Results

The Liver Patient Dataset, which is the first experiment for which the classification accuracy of LR and FNN is measured with 85% training and 15% testing, was used in a variety of experiments carried out in the Python scikit-learn library package. Tables 2 and 3 present the findings.

4.1 Performance of classical machine learning model (LR)

With an MSE of 20.32%, which is comparatively low given that it represents the average squared difference between predicted probability and actual class labels, the LR model correctly classified approximately three-quarters of the cases in the dataset (74.16% classification accuracy), indicating that it offers a respectable baseline performance for this classification challenge.Therefore, we will provide the model’s accuracy and MSE in the dataset in the table below.

            Table 2. Performance of Logistic Regression on the ILPD dataset.

Logistic Regression
Accuracy 74.16%
MSE 20.32%

Additionally, display the accuracy and MSE values using a figure. The Model Evaluation is displayed in Figure 3.

Fig 3  Logistic regression Model Evaluation

So, Figure 4, shows a confusion matrix for logistic regression:

  Fig (4) Confusion Matrix

This is a comparison of the liver patient dataset’s binary class confusion matrix. According to the confusion matrix, which was used to calculate the performance measure, TP, FP, FN, and TN are 37, 12, 11, and 29, respectively.

4.2 Performance of deep learning model (FNN)

The Feedforward Neural Network (FNN) outperformed the Logistic Regression baseline with an accuracy of 80.00% and an MSE of 17.43%. This implies that more intricate, nonlinear interactions in the data could be modeled by the FNN.Therefore, we will provide the model’s accuracy and MSE in the dataset in the table below.

  Table 3. Performance of Feedforward Neural Network (FNN) on the ILPD dataset.

FNN
Accuracy 80.00%
MSE 17.43%

Also, the value of (Accuracy and MSE) is illustrated by a figure. Figure 5 shows the Model Evaluation,

Fig 5 Model Evaluation

So, in Figure 6, we show a confusion matrix:

     Fig (6) Confusion Matrix

Figure 6) shows the comparison of the Liver Patient Dataset’s multiclass confusion matrix. The confusion matrix was used to generate the performance metric. According to the findings, TP, FP, FN, and TN have values of 36, 13, 10, and 30, respectively.

5.Conclusion & Future Work

In this study The Indian Liver Patient Dataset (ILPD) dataset and preprocessing pipeline were used in comparing LR and FNN. The results showed that FNN outperformed LR in terms of accuracy (80.00% vs. 74.16%) and MSE (17.43% vs. 20.32%), suggesting that although LR is still a reliable and interpretable baseline model, FNN’s capacity to capture intricate nonlinear relationships can improve predictive performance in medical classification tasks. Future research will concentrate on expanding the dataset size to improve the models’ generalizability, utilizing hyperparameter optimization strategies like grid search or Bayesian optimization, testing out more sophisticated deep learning architectures like recurrent neural networks (RNNs) or convolutional neural networks (CNNs), and incorporating explainable AI (XAI) techniques to guarantee the accuracy of model prediction.

 References :

  • Books
General format: Author(s). (Year). Title of book: Subtitle. Place of Publication: Publisher.
Hosmer Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Hoboken, NJ: Wiley.

  • Journal & Magazine Articles
Popular magazine articles, general format: Author(s). (Year, Month Day). Title of article: Subtitle. Title of Magazine, volume (issue), pages. Retrieved from URL of magazine web site / library database home page [if applicable]
N/A Ganji, A., Usha, D., & Rajakumar, P. S. (2025). Enhanced early diagnosis of liver diseases using feature selection and machine learning techniques on the Indian Liver Patient Dataset. Scalable Computing: Practice and Experience, 26(3).
N/A Rani, R., Jaiswal, G., Ul-lah, F., et al. (2025). Enhancing liver disease diagnosis with hybrid SMOTE-ENN balanced machine learning models—an empirical analysis of Indian patient liver disease datasets. Frontiers in Medicine, 12.
N/A Dritsas, E., & Trigka, M. (2023). Supervised machine learning models for liver disease risk prediction. Computers, 12(1), 19.
N/A Dritsas, E., & Trigka, M. (2023). Supervised machine learning models for liver disease risk prediction. Computers, 12(1), 19.
N/A Karna, A., Khan, N., Rauniyar, R., & Shambharkar, P. G. (2024). Unified dimensionality reduction techniques in chronic liver disease detection. arXiv preprint arXiv:xxxx.xxxxx.
N/A Ennab, M. (2024). Enhancing interpretability and accuracy of AI models in healthcare: A systematic review. Journal of Medical Informatics, 28(2), 113–130.
N/A Frasca, M. (2024). Explainable and interpretable artificial intelligence in healthcare: Bridging the gap between ML and clinical practice. Artificial Intelligence in Medicine, 145, 102–119.
N/A Hua, Y. (2025). Clinical risk prediction with logistic regression: Applications, challenges, and best practices—A narrative review. Journal of Clinical Medicine, 14(1), 45–60.
N/A Naresh, V. S., & Reddi, S. (2025). Privacy-preserving heart disease prediction: A homomorphic encryption-driven logistic regression framework. Journal of Big Data, 12, 52.
N/A Miah, J., et al. (2023). Improving cardiovascular disease prediction through comparative analysis of machine learning models: A case study on myocardial infarction. Healthcare Analytics, 9, 77–90.
N/A Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317–1318.
N/A Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358.
N/A Suttaket, T., Harsha Vardhan, L. V., & Kok, S. (2024). Interpretable predictive models for healthcare via rational logistic regression. arXiv preprint arXiv:xxxx.xxxxx.
N/A

Choi, E., Xu, Z., Li, Y., Dusenberry, M., Flores, G., & Sun, J. (2020). Learning the graphical structure of electronic health records with graph convolutional neural networks. Artificial Intelligence in Medicine, 110, 101–118.

N/A Shickel, B., Tighe, P. J., Bihorac, A., & Rashidi, P. (2019). Deep EHR: A survey of recent advances in deep learning techniques for electronic health record analysis. IEEE Journal of Biomedical and Health Informatics, 22(5), 1589–1604.
N/A Dahiya, N. (2024). Deep learning-based liver cirrhosis stage prediction using feedforward neural networks. Journal of Data Science and Cybersecurity, 2(1), 32–34.
N/A Ding, J.-E., Thao, P. N. M., Peng, W.-C., Wang, J.-Z., Chug, C.-C., Hsieh, M.-C., Tseng, Y.-C., Chen, L., Luo, D., Wang, C.-T., Chen, P.-F., Liu, F., & Hung, F.-M. (2024). [Incomplete reference—journal details missing].
N/A Schreuder, A., Bosman, A., Engelbrecht, A., & Cleghorn, C. (2023). Training feedforward neural networks with Bayesian hyper-heuristics. Applied Soft Computing, 139, 110914.
N/A Schreuder, A., Bosman, A., Engelbrecht, A., & Cleghorn, C. (2023). Training feedforward neural networks with Bayesian hyper-heuristics. Applied Soft Computing, 139, 110914.
N/A Brilliant.org. (2025, January 8). Feedforward neural networks. Retrieved from https://brilliant.org/wiki/feedforward-neural-networks/
N/A Miller, C. (2024). A review of model evaluation metrics for machine learning algorithms. Journal of Artificial Intelligence Research, 58, 123–145.
N/A Shahid, F., Zameer, A., & Muneer, A. (2022). Evaluating classification algorithms for healthcare data analysis: A comparative study. Expert Systems with Applications, 188, 115904. https://doi.org/10.1016/j.eswa.2021.115904

 

5/5 - (1 صوت واحد)

المركز الديمقراطي العربي

مؤسسة بحثية مستقلة تعمل فى إطار البحث العلمي الأكاديمي، وتعنى بنشر البحوث والدراسات في مجالات العلوم الاجتماعية والإنسانية والعلوم التطبيقية، وذلك من خلال منافذ رصينة كالمجلات المحكمة والمؤتمرات العلمية ومشاريع الكتب الجماعية.

مقالات ذات صلة

زر الذهاب إلى الأعلى