Weak-Supervised-Learning-Case-Study icon indicating copy to clipboard operation
Weak-Supervised-Learning-Case-Study copied to clipboard

Exploring NLP weak supervision approaches to train text classification models. The project is also a prototype for a semi-automated text data labelling platform. Approaches: Snorkel and Zero-Shot Lear...

A Case Study on Weakly Supervised Learning

View our write-up of the project here: A Case Study on Weakly Supervised Learning.

Project was created for the Full Stack Deep Learning 2021 course. This project was chosen as one of the top projects from the course and presented at the project showcase.

Goal of the project

  • Create a text data labeling service where the user inputs text data and receives a labeled dataset.
  • Experiment with weak supervised learning and compare different approaches.

Notebooks

How to use this Project

For using only the Snorkel approach to weak supervision, use the following notebooks in this order: 01, 03, 05, 06.

For using only the model distillation approach to weak supervision, use the following notebooks int this order: 02, 04.

For more information on how to deploy a Streamlit App of this project, please go to our webapp directory.

Project Tree

.
|-- ./pyproject.toml
|-- ./requirements
|   |-- ./requirements/dev.in
|   |-- ./requirements/dev.txt
|   |-- ./requirements/prod.in
|   `-- ./requirements/prod.txt
|-- ./setup.cfg
|-- ./project_proposal.md
|-- ./tasks
|   `-- ./tasks/lint.sh
|-- ./Dockerfile
|-- ./distill_classifier.py
|-- ./service.py
|-- ./test_request.json
|-- ./train_baseline_dbpedia_model.py
|-- ./tree-md
|-- ./text_classifier
|   |-- ./text_classifier/__init__.py
|   |-- ./text_classifier/models
|   |   `-- ./text_classifier/models/__init__.py
|   |-- ./text_classifier/lit_models
|   |   `-- ./text_classifier/lit_models/__init__.py
|   `-- ./text_classifier/notebooks
|       |-- ./text_classifier/notebooks/01_dbpedia_14_bert_classification_exploration.ipynb
|       |-- ./text_classifier/notebooks/04_transformers-multi-label-classification-toxicity.ipynb
|       |-- ./text_classifier/notebooks/03_dbpedia_14_snorkel_dataset_labeling.ipynb
|       |-- ./text_classifier/notebooks/05_toxicity_classification_snorkel_dataset.ipynb
|       |-- ./text_classifier/notebooks/02_dbmedia_14_distilling_with_zero_shot_classification.ipynb
|       `-- ./text_classifier/notebooks/06_AMLS_model_deployment.ipynb
|-- ./data
|   |-- ./data/toxic_comments
|   |   |-- ./data/toxic_comments/test.csv
|   |   |-- ./data/toxic_comments/toxic_dev_200_examples.csv
|   |   |-- ./data/toxic_comments/toxic_test_630_examples.csv
|   |   |-- ./data/toxic_comments/toxic_train_2100_examples.csv
|   |   |-- ./data/toxic_comments/toxic_val_70_examples.csv
|   |   |-- ./data/toxic_comments/train.csv
|   |   |-- ./data/toxic_comments/toxicity_snorkel_dataset_3014ex.csv
|   |   `-- ./data/toxic_comments/toxicity_test_675ex.csv
|   `-- ./data/readme.md
|-- ./README.md
`-- ./webapp
    |-- ./webapp/Dockerfile
    |-- ./webapp/app.py
    |-- ./webapp/backend.py
    |-- ./webapp/demo_config.json
    |-- ./webapp/requirements.txt
    |-- ./webapp/run_webapp.sh
    |-- ./webapp/utils.py
    `-- ./webapp/README.md%

Project Proposal

Find our project proposal here.