setfit icon indicating copy to clipboard operation
setfit copied to clipboard

for non default loss_class, onehot encoded labels dont work

Open ps24601 opened this issue 2 years ago • 0 comments

One-hot encoded label for multi class dont work with non-default loss_class (eg:BatchHardTripletLoss, ...) Note: Not using SetfitHead, but scikit learn one-vs-rest.

TypeError: unhashable type: 'list'

The error comes from line 353 in trainer.py train_data_sampler = SentenceLabelDataset(train_examples, samples_per_label=self.samples_per_label) which uses SentenceLabelDataset from sentence-transformer.

However when I change the labels to categorical enocoded it works.

This works: dataset = Dataset.from_dict({"text": ["a 1", "b 1", "c 1", "a 2", "b 2"], "label": [0, 1, 2, 0, 1]})

This doesnt works: dataset = Dataset.from_dict({"text": ["a 1", "b 1", "c 1", "a 2", "b 2", "c 2"], "label": [[1,0,0] [0,1,0] [0,0,1] [1,0,0] [0,1,0]]})

Maybe this is desired, but I couldn't find anywhere in documentation that this should be avoided, and example notebook talk about using One-hot encoded labels only.

ps24601 avatar Jul 28 '23 14:07 ps24601