Zero-Shot Learning in Modern NLP | Joe Davison Blog
Zero-Shot Learning in Modern NLP | Joe Davison Blog
State-of-the-art NLP models for text classification without annotated data
Nice blog - I only had time to skim through the high level of each method. Which method does the transformers pipeline use?
Nice blog - I only had time to skim through the high level of each method. Which method does the transformers pipeline use?
Thanks! The pipeline uses the NLI method.
This article is brilliantly written!
Thank you, perfect article. Could you please suggest most suitable way how to classify text (contains N sentences) to expected label?
Thank you, amazing work. Can I see the code behind your online demo please?
@hishamkhrayzat51 Yeah the repo is here.
Hello, I'd like to kown on how many GPU your API for the Zero shot topic classification is running. Because, when trying to scan a 50 sentences text with 10 topics on Colab, it takes approximatively 5 minutes per text.... It looks like it's way faster on your web API though.
Thank you for your answer,
Clotilde
@clotildemiura It's slow if you're not on GPU since you have to run each text/candidate label pair through the model separately. If the web API is significantly faster, it's probably just because the results for examples you're looking at are cached. The web API is also just using CPU.
A few tips for speeding up the pipeline here.
thank you very much @joeddav
This is very interesting.
I had read two other papers on zero-shot learning sometime ago. The key ideas was:

Really great article Joe! This will especially work for english text right? What would you advise for non english languages what don't have mnli datasets or nli trained bert models?
@gevezex Yep, I actually trained a model on a multilingual NLI dataset for this exact purpose! Tweet here: https://twitter.com/joeddav/status/1298997753075232772
Hey Joe, great article!
I have a silly question about this in the few-shot learning for the embedding approaches:
Take the top K most frequent words V in the vocabulary of a word2vec model
By the top K most frequent words, do you mean the top K from the corpus you are trying to classify?
Thanks for the multilingual NLI, btw!
@agombert Glad you enjoyed it! Sorry, this was difficult to communicate. The format of word vector files typically orders the words by inverse frequency in the algorithm's train corpus. I meant the top K according to that ordering. So if you have a .vec file with 100k words (lines), just use the first K.
Wondering about using bigrams in candidate labels = ["not sustainable","climate change","environment pollution","government state policy","finance bank] wondering what happens - will these work. I think b-grams could add more context.
Fantastic article!
Just a minor fix: the model name in the last code snippet should be facebook/bart-large-mnli.
Fascinating Article Joe Is there any resource available on how to fine-tune such models with our own Data? Thanks
Really great article keep it up
Hi Joe, thanks for you article!! It is possible to fine-tune this models?
@joeddav thanks or the article. I find it very helpful.
do you happen to have the notebook/code available for mapping from s-bert to word2vec? I wonder how it is done and also how you generate the word2vec embedding for phrases such as "Science and Mathematics". 🤔
Hi thanks a lot for the article and notebook. Just have a quick question , what is the default model in the pipeline is it Bart MNLI?
Can you please show or direct me to a place where the fine-tuning is explained. I have about a 1000 sentences with their labels. I want to fine-tune this model on the task. During inference a subset of the labels will be used -- so zero shot learning would be the best way to go. But when you meant "pass the sentence twice, once with correct label and once with incorrect label while optimising cross-entropy", I want to see how that is done using HuggingFace.
As @Boodhayana said, I would also love to see the actual code that carries out the fine-tuning, I also have a data set that I want to fine tune the bart-mnli zero shot model on but can't find any examples of how to do so.
Could you please post the code you used to finetune bart-large-mnli on Yahoo answers ?
Regrettably, I failed to save that code. If you need to fine-tune, I recommend first distilling a classifier using this script, (https://github.com/huggingface/transformers/tree/main/examples/research_projects/zero-shot-distillation) and then fine-tuning the resulting model as you would any other classifier.
On Apr 28 2022, at 4:34 AM, Maroua Ghaouat @.***> wrote:
Could you please post the code you used to finetune bart-large-mnli on Yahoo answers ?
—
Reply to this email directly, view it on GitHub (https://github.com/joeddav/blog/issues/2#issuecomment-1112047791), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ACHLU2NCFWAFRMMLCYXM7ODVHJSVBANCNFSM4OPHI3AQ).
You are receiving this because you were mentioned.
@joeddav np at all. I am able to successfully fine tune the model. Your blog, and your answers in HuggingFace forums helped me a lot. I have one concern, however. Since I am using the fine-tuned model in production, i would need it to be fast(as fast as normal text classification ones). I have ~30 labels in my dataset. I am accelerating the inference time by using "onnxruntime" on the huggingface model that i fine-tune.
The code for 'onnx'-ing is below
python -m transformers.onnx --model=facebook/bart-large-mnli --feature=sequence-classification --atol=1e-04 dir/`
Even after that, the inference time for one piece of text takes almost 2seconds(it has to iterated through 30 labels).
Are there any methods to further fasten the inference?
Does distillation help? Any other methods that i can use along with this? I want to match the inference time taken by normal text classification.
@Boodhayana Distillation is exactly what you want. It will essentially train a student model, which is just a normal distilbert classifier, to mimic the predictions of the zero-shot teacher. You just need some example (unlabeled data).
@Boodhayana can you share or direct to place to understand how the fine tuning is actually done?
@joeddav for distillation what should the candidate labels be? I think it should be the candidate labels you want to use for your application, regardless of what the text you're using for distillation is about. For example, if I want to train a model to classify movie summaries into genres, I could use the AG news data to distill a zero-shot model into a smaller one, using hypotheses labels like ['thriller', 'action', 'suspense', 'horror', 'comedy'], even though the AG news data has nothing to do with that. Then I could fine tune that distilled model with actual movie summary - genre data, right?
Hey, thank you for getting back to me. I'm very excited to see that post! In the meantime I'm actually trying my hand with pytorch, and I'm wondering how to encode my labels. As suggested in the zero-shot learning blog post, I'm only using the labels entailment and contradiction, but I'm unsure what are the actual encodings used in the model. From this kaggle competition https://www.kaggle.com/competitions/contradictory-my-dear-watson I saw they're using 0, 1, or 2 (corresponding to entailment, neutral, and contradiction). Should I set up my encodings this way also? (0 for entailment and 2 for contradiction?)
Thank you,
Tada
On Sat, May 21, 2022 at 3:16 AM boodhayana @.***> wrote:
@Boodhayana https://github.com/Boodhayana Distillation is exactly what you want. It will essentially train a student model, which is just a normal distilbert classifier, to mimic the predictions of the zero-shot teacher. You just need some example (unlabeled data).
I plan to write a blog using a public dataset. So please wait a few days since i am using a private dataset that i cant share outside
— Reply to this email directly, view it on GitHub https://github.com/joeddav/blog/issues/2#issuecomment-1133491962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGIE2U6CWZPMGZU4XS3AAZTVLA2P5ANCNFSM4OPHI3AQ . You are receiving this because you commented.Message ID: @.***>
-- Tadaishi Yatabe R.
http://tadaishi.wixsite.com/tada http://tadaishi.wix.com/tada