setfit icon indicating copy to clipboard operation
setfit copied to clipboard

Add support for zero-shot classification

Open lewtun opened this issue 3 years ago • 2 comments

In our paper, we ran some experiments where adding synthetic data like:

{"text": "This sentence is <class-name-0>", "label": <class-label-0>}
{"text": "This sentence is <class-name-1>", "label": <class-label-1>}
...

and applying the SetFit method gave a boost in performance. In particular, one can use this technique for zero-shot classification and we found that it typically outperforms the BART model used in the zershot-classification pipeline in transformers.

It would be nice to enable this feature by having a function like:

from datasets import Dataset

def add_zeroshot_examples(dataset: Dataset, candidate_labels: Union[str, List[str]], template: str = "This sentence is {}") -> Dataset:
    # Apply logic to create `Dataset` from `template` and `candidate_labels`

This way one could have a workflow like:

from datasets import load_dataset

dataset = load_dataset("sst2", split="train")

dataset_with_zeroshot_examples = add_zeroshot_examples(dataset)

lewtun avatar Oct 07 '22 12:10 lewtun

Big vote for it! Was actually questioning if this could work.

Raidus avatar Oct 23 '22 03:10 Raidus

I'll make a PR for this.

pdhall99 avatar Oct 23 '22 21:10 pdhall99

Sounds like this has been implemented!

tomaarsen avatar Dec 13 '22 22:12 tomaarsen