toxic_comment_classification icon indicating copy to clipboard operation
toxic_comment_classification copied to clipboard

Predict Toxic Comments in the wild

Toxic Comment Classification

This is my codes for the toxic comment classification competition hosted in Kaggle. Fully modified to another level from the base code here

To download datasets please run get_data.sh

The Task

The dataset comprises of comments from Wikipedia’s talk page edits. It is a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:

  • toxic
  • severe_toxic
  • obscene
  • threat
  • insult
  • identity_hate

The Approach

Creating an ensemble model which predicts a probability of each type of toxicity for each comment.Full explaination of my approach is documented here

Install Pre-requisites

run install.sh and then run pip install -r requirements.txt

Tips

  • Make sure embeddings original preprocessing is used to ensure highest percentage of embeddings can be imported