ersilia icon indicating copy to clipboard operation
ersilia copied to clipboard

🦠 Model Request: TDC Skin Reaction

Open GemmaTuron opened this issue 2 years ago • 5 comments

Model Title

Skin Reaction (TDC dataset)

Publication

Hello @alaminumar!

As part of your Outreachy contribution, we have assigned you the dataset "Skin Reaction" from the Therapeutics Data Commons to try and build a binary classification ML model. Please copy the provided Google Colab template and use this issue to provide updates on the progress. We'll value not only being able to build the model but also interpreting its results.

Code

No response

GemmaTuron avatar Oct 24 '22 09:10 GemmaTuron

Thanks Gemma.

alaminumar avatar Oct 24 '22 11:10 alaminumar

Sorry for the lateness.

Skin Reaction Dataset overview: I'm working on Skin Reaction Dataset. Exposure to chemical agents can induce an immune reaction in susceptible individuals that lead to skin sensitization. Given the smile drug, can we predict whether it can cause a skin reaction 1 or 0. The Dataset contains 404 drugs.

Importing Dataset: I have successfully installed TDC package and imported the Skin Reaction Dataset from Toxicity Single instance prediction Datasets from the TDC package

alaminumar avatar Oct 26 '22 09:10 alaminumar

Splitting Datasets: Successfully Split the model into three datasets.

  • Train Dataset: The dataset that is going to train our model. The Dataset contains 283 drugs with 196 active and 87 inactive
  • Validation Dataset: The dataset that is going to Evaluate, optimize and improve our model. The Dataset contains 40 drugs with 29 active and 11 inactive
  • Test Dataset: The dataset that is going to test our model after training and validation. The Dataset contains 81 drugs with 49 active and 32 inactive

alaminumar avatar Oct 26 '22 10:10 alaminumar

Data Visualization Used matplotlib to visualize the amount of actives(1) and inactives(0) we have in our Dataset. As the image shows this is clearly a binary classification problem matplotlibimage

Using RDKIT we can Visualize the moleculatr structure of our Smiles . Succesfuly imported and drawn an active and inactive molecule respectfully. rdkit

alaminumar avatar Oct 27 '22 09:10 alaminumar

@GemmaTuron can you review what i have done. Here is my Colab

alaminumar avatar Oct 27 '22 13:10 alaminumar

Hi @alaminumar !

Good start, but can you provide an explanation of the model performances?

GemmaTuron avatar Oct 28 '22 12:10 GemmaTuron

Okay Gemma. First let me explain how we have gotten our models.

**Model Training: ** We train our model when we take Smile Drug as input(X) in our model and pass Y as it's output which is its predicted bioactivity. We use Lazy-QSAR model and MorganBinaryClassifier for our training, thus don't need to convert smiles into signatures as it is done automatically.

**Evaluate Model: ** In order to Evaluate our model, we use the following.

  • Precision & Recall Precision is the ratio between the positives our models correctly predicted and the number of positives our model predicted correct or otherwise. Tp/(Tp + Fp) Recall : How many Positives we are were able to identify. The ratio is Tp/(Tp +Fn)
  • AUROC value
  • AUC graph
  • & Confusion (Contingency Matrix)

To answer your question Gemma . My model performance for my first iteration was average to poor. So, I decided to double the time we trained the model to 3600 seconds . My first iteration had an AUROC value of 0.61128 and 0.7708 for the validation and test models respectively . As we can see its not that good . Here are the corresponding graphs and data for the second iteration.

Validation Precision 0.7368421052631579 Recall 0.9655172413793104

  • Contingency Matrix ConfusionMatrix_Validation1 as we can see from the confusion matrix we have 38/41 accurately predicted . This is very good

  • ROC Curve ROC_Validation1 AUROC Value 0.6332288401253919

Test Precision of a Test Set: 0.6125 Recall of a Test Set: 1.0

  • Contingency Matrix

ConfusionMatrix_Test1

  • ROC Curve

ROC_Test1

AUROC 0.7822066326530612

alaminumar avatar Oct 29 '22 09:10 alaminumar

Sorry for the lateness @GemmaTuron . I had to deal with an Emergency.

Updated colab Colab

alaminumar avatar Oct 29 '22 09:10 alaminumar

Hi @alaminumar

I hope everything is solved, good job on the modelling. I'll mark this as completed and you can move onto finalising your outreachy application!

GemmaTuron avatar Oct 31 '22 13:10 GemmaTuron