cookbook
cookbook copied to clipboard
Effectively Annotate Text Data for Transformers via Active Learning using Cleanlab
What does this PR do?
Demonstrating how to effectively annotate text data for Transformer models using active learning, specifically leveraging the Cleanlab open-source package.
-
Introduction to active learning and its importance in efficiently utilizing labeling efforts under budget constraints.
-
Implementation of the ActiveLab algorithm, which assists in prioritizing data for annotation based on the potential impact on model performance. This is particularly beneficial when dealing with noisy annotators, as it helps in deciding whether to seek additional annotations for previously labeled data or new data.
-
A detailed walkthrough on iteratively improving a text classification model by selecting the most impactful data points for annotation, retraining the model, and evaluating its performance.
Who can review?
@MKhalusova appreciate your review.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Really interesting topic! Thank you for contributing! I have left some suggestions.
I have addressed all the comments. Please take a look and let me know if you have any suggestions.
@MKhalusova Updated title across the toc and index.md
@aravindputrevu I'll give this a final review tomorrow :)
@aravindputrevu this is a great lesson. There are still a few comments from Maria to respond to:
- [x] https://github.com/huggingface/cookbook/pull/63#discussion_r1529009683 (replace Pandas DataFrame
head - [x] I left a few other minor comments in the notebook Let me know if anything is unclear, think we're almost ready to merge this :)
@davanstrien - I have addressed the comments, please let me know.
Thank you @davanstrien