cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

Effectively Annotate Text Data for Transformers via Active Learning using Cleanlab

Open aravindputrevu opened this issue 1 year ago • 8 comments

What does this PR do?

Demonstrating how to effectively annotate text data for Transformer models using active learning, specifically leveraging the Cleanlab open-source package.

  • Introduction to active learning and its importance in efficiently utilizing labeling efforts under budget constraints.

  • Implementation of the ActiveLab algorithm, which assists in prioritizing data for annotation based on the potential impact on model performance. This is particularly beneficial when dealing with noisy annotators, as it helps in deciding whether to seek additional annotations for previously labeled data or new data.

  • A detailed walkthrough on iteratively improving a text classification model by selecting the most impactful data points for annotation, retraining the model, and evaluating its performance.

Who can review?

@MKhalusova appreciate your review.

aravindputrevu avatar Mar 15 '24 21:03 aravindputrevu

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Really interesting topic! Thank you for contributing! I have left some suggestions.

MKhalusova avatar Mar 18 '24 17:03 MKhalusova

I have addressed all the comments. Please take a look and let me know if you have any suggestions.

aravindputrevu avatar Mar 29 '24 01:03 aravindputrevu

@MKhalusova Updated title across the toc and index.md

aravindputrevu avatar Apr 02 '24 06:04 aravindputrevu

@aravindputrevu I'll give this a final review tomorrow :)

davanstrien avatar Apr 02 '24 17:04 davanstrien

@aravindputrevu this is a great lesson. There are still a few comments from Maria to respond to:

  • [x] https://github.com/huggingface/cookbook/pull/63#discussion_r1529009683 (replace Pandas DataFrame head
  • [x] I left a few other minor comments in the notebook Let me know if anything is unclear, think we're almost ready to merge this :)

davanstrien avatar Apr 03 '24 09:04 davanstrien

@davanstrien - I have addressed the comments, please let me know.

aravindputrevu avatar Apr 03 '24 14:04 aravindputrevu

Thank you @davanstrien

aravindputrevu avatar Apr 07 '24 14:04 aravindputrevu