setfit
setfit copied to clipboard
Add support for zero-shot classification
In our paper, we ran some experiments where adding synthetic data like:
{"text": "This sentence is <class-name-0>", "label": <class-label-0>}
{"text": "This sentence is <class-name-1>", "label": <class-label-1>}
...
and applying the SetFit method gave a boost in performance. In particular, one can use this technique for zero-shot classification and we found that it typically outperforms the BART model used in the zershot-classification pipeline in transformers.
It would be nice to enable this feature by having a function like:
from datasets import Dataset
def add_zeroshot_examples(dataset: Dataset, candidate_labels: Union[str, List[str]], template: str = "This sentence is {}") -> Dataset:
# Apply logic to create `Dataset` from `template` and `candidate_labels`
This way one could have a workflow like:
from datasets import load_dataset
dataset = load_dataset("sst2", split="train")
dataset_with_zeroshot_examples = add_zeroshot_examples(dataset)
Big vote for it! Was actually questioning if this could work.
I'll make a PR for this.
Sounds like this has been implemented!