classy-classification
classy-classification copied to clipboard
This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.
Classy Classification
Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.
Install
pip install classy-classification
Quickstart
SpaCy embeddings
import spacy
import classy_classification
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.load("en_core_web_md")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "spacy"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Sentence-transfomer embeddings
import spacy
import classy_classification
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
nlp = spacy.blank("en")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"device": "gpu"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Hugginface zero-shot classifiers
import spacy
import classy_classification
data = ["furniture", "kitchen"]
nlp = spacy.blank("en")
nlp.add_pipe(
"text_categorizer",
config={
"data": data,
"model": "facebook/bart-large-mnli",
"cat_type": "zero",
"device": "gpu"
}
)
print(nlp("I am looking for kitchen appliances.")._.cats)
# Output:
#
# [{"label": "furniture", "score": 0.21}, {"label": "kitchen", "score": 0.79}]
Credits
Inspiration Drawn From
Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.
Or buy me a coffee
Standalone usage without spaCy
from classy_classification import classyClassifier
data = {
"furniture": ["This text is about chairs.",
"Couches, benches and televisions.",
"I really need to get a new sofa."],
"kitchen": ["There also exist things like fridges.",
"I hope to be getting a new stove today.",
"Do you also have some ovens."]
}
classifier = classyClassifier(data=data)
classifier("I am looking for kitchen appliances.")
classifier.pipe(["I am looking for kitchen appliances."])
# overwrite training data
classifier.set_training_data(data=data)
classifier("I am looking for kitchen appliances.")
# overwrite [embedding model](https://www.sbert.net/docs/pretrained_models.html)
classifier.set_embedding_model(model="paraphrase-MiniLM-L3-v2")
classifier("I am looking for kitchen appliances.")
# overwrite SVC config
classifier.set_svc(
config={
"C": [1, 2, 5, 10, 20, 100],
"kernels": ["linear"],
"max_cross_validation_folds": 5
}
)
classifier("I am looking for kitchen appliances.")
Todo
[ ] look into a way to integrate spacy trf models.
[ ] multiple clasifications datasets for a single input e.g. emotions and topic.