DL-Simplified icon indicating copy to clipboard operation
DL-Simplified copied to clipboard

Diabetes Prediction using DL

Open tarunvyshnav777 opened this issue 1 year ago • 19 comments

Deep Learning Simplified Repository (Proposing new issue)

:red_circle: Project Title : Diabetes Prediction :red_circle: Aim : To create a DL model that predicts whether a person has diabetes or not. :red_circle: Dataset : https://www.kaggle.com/datasets/mathchi/diabetes-data-set :red_circle: Approach : Try to use 3-4 algorithms to implement the models and compare all the algorithms to find out the best fitted algorithm for the model by checking the accuracy scores. Also do not forget to do a exploratory data analysis before creating any model.


📍 Follow the Guidelines to Contribute in the Project :

  • You need to create a separate folder named as the Project Title.
  • Inside that folder, there will be four main components.
    • Images - To store the required images.
    • Dataset - To store the dataset or, information/source about the dataset.
    • Model - To store the machine learning model you've created using the dataset.
    • requirements.txt - This file will contain the required packages/libraries to run the project in other machines.
  • Inside the Model folder, the README.md file must be filled up properly, with proper visualizations and conclusions.

:red_circle::yellow_circle: Points to Note :

  • The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
  • "Issue Title" and "PR Title should be the same. Include issue number along with it.
  • Follow Contributing Guidelines & Code of Conduct before start Contributing.

:white_check_mark: To be Mentioned while taking the issue :

  • Full name : Burugupalli Tarun Vishnav
  • GitHub Profile Link : https://github.com/tarunvyshnav777
  • Email ID : [email protected]
  • Participant ID (if applicable):
  • Approach for this Project :
  1. Prepare the labeled dataset and split it into training and testing sets.
  2. Extract or engineer relevant features from the data.
  3. Instantiate an SVM classifier with the chosen kernel function and hyperparameters.
  4. Train the SVM classifier using the training data to find the optimal hyperplane.
  5. Evaluate the model's performance on the testing set and use it for making predictions on new data.
  • What is your participant role? SSOC' 2023

Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

tarunvyshnav777 avatar Jun 12 '23 03:06 tarunvyshnav777

What are the deep learning methods you want to implement here? @tarunvyshnav777

abhisheks008 avatar Jun 12 '23 14:06 abhisheks008

Full name : Ananya Sen GitHub Profile Link : https://github.com/vanya-anya Email ID : [[email protected]] Participant ID (if applicable): Approach for this Project : Prepare the labeled dataset and split it into training and testing sets. Extract or engineer relevant features from the data. Instantiate an SVM classifier with the chosen kernel function and hyperparameters. Train the SVM classifier using the training data to find the optimal hyperplane. Evaluate the model's performance on the testing set and use it for making predictions on new data. What is your participant role? SSOC' 2023

vanya-anya avatar Jun 14 '23 16:06 vanya-anya

What are the deep learning methods you want to implement here? @tarunvyshnav777

@abhisheks008 I'm confused here, we only need to work with dl models? But SVM is the best choice because of nature of dataset https://www.kaggle.com/datasets/mathchi/diabetes-data-set if SVM is fine please assign this under SSOC'23

tarunvyshnav777 avatar Jun 14 '23 16:06 tarunvyshnav777

Full name : Riya Parag Dhanduke GitHub Profile Link : https://github.com/riiyaa24 Email ID : [email protected] Participant ID (if applicable): Approach for this Project : 1.Prepare the labeled dataset and split it into training and testing sets. 2.Extract or engineer relevant features from the data. 3.Instantiate an SVM classifier with the chosen kernel function and hyperparameters. 4.Train the SVM classifier using the training data to find the optimal hyperplane. 5.Evaluate the model's performance on the testing set and use it for making predictions on new data.

What is your participant role? SSOC' 2023 A contributor in the development of open source projects

riiyaa24 avatar Jun 14 '23 16:06 riiyaa24

Full name : Chandrima Paul GitHub Profile Link : https://github.com/chandrima200 Email ID : [email protected] Participant ID (if applicable): Approach for this Project : Prepare the labeled dataset and split it into training and testing sets. Extract or engineer relevant features from the data. Instantiate an SVM classifier with the chosen kernel function and hyperparameters. Train the SVM classifier using the training data to find the optimal hyperplane. Evaluate the model's performance on the testing set and use it for making predictions on new data. What is your participant role? SSOC' 2023

chandrima200 avatar Jun 14 '23 17:06 chandrima200

  • Full name : Sarmistha Saha
  • GitHub Profile Link : https://github.com/sarmistha-02
  • Email ID : [[email protected])
  • Participant ID (if applicable):
  • Approach for this Project :
  1. Prepare the labeled dataset and split it into training and testing sets.
  2. Extract or engineer relevant features from the data.
  3. Instantiate an SVM classifier with the chosen kernel function and hyperparameters.
  4. Train the SVM classifier using the training data to find the optimal hyperplane.
  5. Evaluate the model's performance on the testing set and use it for making predictions on new data.
  • What is your participant role? SSOC' 2023

sarmistha-02 avatar Jun 14 '23 17:06 sarmistha-02

What are the deep learning methods you want to implement here? @tarunvyshnav777

@abhisheks008 I'm confused here, we only need to work with dl models? But SVM is the best choice because of nature of dataset https://www.kaggle.com/datasets/mathchi/diabetes-data-set if SVM is fine please assign this under SSOC'23

@tarunvyshnav777 try to find out such a dataset which is compatible with deep learning methods.

abhisheks008 avatar Jun 15 '23 05:06 abhisheks008

Full name : Mohammed Owais GitHub Profile Link : https://github.com/OWAIS-THEGREAT Email ID : [email protected] Participant ID (if applicable): Approach for this Project : First I will analyze the datasets in the link. Then I will remove those pictures that are not suitable for processing. Then I will preprocess the data to make it suitable for the Models. In my opinion, the best approach is to apply ANN to this dataset. Then I will use different parameters to get the best results. Will provide the most accurate model for this repo. I think I have the required amount of knowledge for this issue. What is your participant role? SSOC' 2023

OWAIS-THEGREAT avatar Jun 15 '23 12:06 OWAIS-THEGREAT

Full name : Mohammed Owais GitHub Profile Link : https://github.com/OWAIS-THEGREAT Email ID : [email protected] Participant ID (if applicable): Approach for this Project : First I will analyze the datasets in the link. Then I will remove those pictures that are not suitable for processing. Then I will preprocess the data to make it suitable for the Models. In my opinion, the best approach is to apply ANN to this dataset. Then I will use different parameters to get the best results. Will provide the most accurate model for this repo. I think I have the required amount of knowledge for this issue. What is your participant role? SSOC' 2023

The mentioned dataset does not contain any image files. The dataset is suitable for implementing ML models not the Image processing methods.

abhisheks008 avatar Jun 15 '23 12:06 abhisheks008

Full name : Mohammed Owais GitHub Profile Link : https://github.com/OWAIS-THEGREAT Email ID : [email protected] Participant ID (if applicable): Approach for this Project : First I will analyze the datasets in the link.

Then I will preprocess the data to make it suitable for the Models. In my opinion, the best approach is to apply ANN and will also try different models to this dataset. Then I will use different parameters to get the best results. Will provide the most accurate model for this repo. I think I have the required amount of knowledge for this issue. What is your participant role? SSOC' 2023

OWAIS-THEGREAT avatar Jun 15 '23 12:06 OWAIS-THEGREAT

What are the deep learning methods you want to implement here? @tarunvyshnav777

@abhisheks008 I'm confused here, we only need to work with dl models? But SVM is the best choice because of nature of dataset https://www.kaggle.com/datasets/mathchi/diabetes-data-set if SVM is fine please assign this under SSOC'23

@tarunvyshnav777 try to find out such a dataset which is compatible with deep learning methods.

@abhisheks008 The dataset: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset Approach for this Project:

  1. Extract or engineer relevant features from the data and perform EDA on it.
  2. Preprocess the data, handle missing values, and split it into training and testing sets.
  3. Use a Sequential model in TensorFlow/Keras.
  4. Implement 2-3 algorithms such as MLP, CNN, and RNN.
  5. Compare the algorithms based on accuracy scores to determine the best-fitted model.

@abhisheks008 Assign this under SSOC'23.

tarunvyshnav777 avatar Jun 15 '23 16:06 tarunvyshnav777

Go ahead @tarunvyshnav777. Issue assigned to you.

abhisheks008 avatar Jun 16 '23 03:06 abhisheks008

Full name : Aindree Chatterjee GitHub Profile Link : https://github.com/aindree-2005 Email ID : [email protected] Participant ID (if applicable): CodePeak 2023 Approach for this Project : SVC vs ANN comparison

aindree-2005 avatar Dec 11 '23 03:12 aindree-2005

Full name : Avaya Aggarwal GitHub Profile Link : https://github.com/OnePunchMonk Email ID : [email protected] Participant ID (if applicable): CodePeak 2023 Approach for this Project : PyCaret for AutoML, Ensembling

OnePunchMonk avatar Dec 11 '23 09:12 OnePunchMonk

Hi @abhisheks008 please assign this issue to me .I have worked on a similar problem to predict whether a person is healthy or not using the same approach you mentioned.

YashSachan2 avatar Dec 12 '23 19:12 YashSachan2

Full Name: Kunal Sharma GitHub Profile Link: https://github.com/KunalSharmaGit Email ID: [email protected] Approach For this Project: Preprocess the data and split it into training, validation and test sets. Design and train deep learning models such as a feedforward neural network, a convolutional neural network (CNN) and a recurrent neural network (RNN). Compare their accuracy scores on the test set to identify the most effective model for diabetes prediction. What is your Participant Role? SWOC'23

KunalSharmaGit avatar Jan 02 '24 05:01 KunalSharmaGit

Cool @KunalSharmaGit issue assigned to you. You can start working on it.

Suggestion: Try to use ResNet or BERT for this project, look for the results/outputs it shows.

abhisheks008 avatar Jan 02 '24 05:01 abhisheks008

@abhisheks008 Thanks for assigning this.

KunalSharmaGit avatar Jan 02 '24 05:01 KunalSharmaGit

Full Name: Sayanta Chowdhury GitHub Profile Link: https://github.com/sayanta28 Email ID: [email protected] Approach For this Project: I will try to split the dataset into training, validation and test sets. Then I will design and train the dataset using DL models such as LSTM, CNN and RNN. I will compare the accuracy scores of those models on the test sets to identify the most effective model for diabetes prediction. What is your Participant Role? GSSoC'24

sayanta28 avatar May 10 '24 18:05 sayanta28

Full name : Gaurav Kumar Singh GitHub Profile Link : https://github.com/gaurav-576 Email ID : [email protected] Participant ID (if applicable): Approach for this Project : I would like to prepare the labeled dataset and split it into training and testing sets.

Then I would be using an Artificial Neural Network to fit on the particular dataset while maintaining that there is no overfitting. There is one more thing I would like to add about confusion matrix. Since its a diabetes prediction model, I would be focusing more on the fact that the false negative is minimised by focusing on recall while maintaining a good accuracy of the overall model. This means that the model should not predict a person to be non-diabetic if he is diabetic which may cause trouble in this prediction and real-life scenario. Evaluate the model's performance on the testing set and use it for making predictions on new data. What is your participant role? GSSoC'24

Gaurav-576 avatar May 11 '24 23:05 Gaurav-576

Hi @Gaurav-576 can you specify the algorithms you are going to use here for this project?

abhisheks008 avatar May 12 '24 03:05 abhisheks008

The diabetes prediction model is a binary classification type problem so the machine learning algorithms which I would like to try out would be Logistic Regression, Support Vector Machines(SVM), Random Forest and if these algorithms are not fitting the problem properly and not giving me a very good accuracy then I would try out the k-nearest neighbours(k-NN). The most valid approach to solve this problem using deep learning would be building an artificial neural network using different types of activation functions and for the optimizers I might lose Stochastic Gradient Descent or Adam optimizers which would fit the data properly and give me a high accuracy for both test and training data would be an appropriate algorithm for this project.

Gaurav-576 avatar May 12 '24 03:05 Gaurav-576

Hi @Gaurav-576 one issue at a time.

abhisheks008 avatar May 12 '24 03:05 abhisheks008

Hi I would like to contribute to this issue under GSSOC 24 as a contributor please assign me this issue

Aryanmartinian avatar May 12 '24 11:05 Aryanmartinian

Hi @Aryanmartinian can you please comment out as per the issue template?

abhisheks008 avatar May 12 '24 13:05 abhisheks008

Full Name - Aryan Mishra Github Profile Link - https://github.com/Aryanmartinian Email ID - [email protected] Approach for this Project - I would like to prepare the dataset and do the EDA on it in first task and then divide it into training and testing sets. Then I would be using an ANN to fit on the dataset and use the optimizers to optimize the preformance of the model and then evaluate the performance of the model on the selected metrics.

Aryanmartinian avatar May 12 '24 14:05 Aryanmartinian

Please assign me this issue

Aryanmartinian avatar May 12 '24 14:05 Aryanmartinian

Hi @Aryanmartinian you need to be specific with the approach. You have to mention a detailed approach before taking an issue, it's obvious Neural networks will gonna be use here. You need to be specific about the models/algortihms you are planning to use here.

abhisheks008 avatar May 12 '24 14:05 abhisheks008

hi, @abhisheks008 ,could you please assign issue to me.I have experience of working with Machine Learning and Deep Learning. Full Name:Mule Sai Krishna Reddy Github Profile Link:https://github.com/saikrishna823 Email ID:[email protected] Participant ID (if applicable): Approach for this Project : Since It is a binary classification probelm I will use following algorithm like Logistic Regression,SVM,Random Forest, Decision Trees and XGboost.I will compare accuracies of all models to find best model.I build ANN model too by modifying different activation functions to get better accuracy.I will also use TabNet which is pre trained model.For user interaction I will also create web interface using streamlit or flask.Please assign me this issue with proper level tag.Looking forward to contributing to this issue. What is your participant role? I am participating as contributor through gssoc' 24.

saikrishna823 avatar May 14 '24 06:05 saikrishna823

Hi @saikrishna823 as this project repo solely demands deep learning projects, please sure your approach should have the same thing instead of simple machine learning methods. You can share your enhanced/updated approach for this project.

abhisheks008 avatar May 14 '24 07:05 abhisheks008