unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Remaining issues with additional datasets

Open yoavkatz opened this issue 2 years ago • 2 comments

There are few open issues:

There is no multi_label template (fix required to unfair_tos and reuters) Can I use text_type : argument? I wonder if dbpedia_14 is of type text or paragraph I do not remember why I need to add ListFieldValues(fields=["label"], to_field="label") , maybe it's not relevant anymore. There are some datasets where we played with splits. E.g. law_stack_exchange.py in we swapped train and test. Is it ok to do it? Or in financial_tweets validation went to test.

yoavkatz avatar Dec 20 '23 09:12 yoavkatz

Also I think its worth changing the backbone template of classification tasks from InputOutputTemplate to MultipleChoiceTemplate so we can allow models to choose from the options field it generates

elronbandel avatar Dec 24 '23 14:12 elronbandel

I think MultipleChoiceTemplate is an additional possible template. I think it's only relevant for multi class and not mult label.

yoavkatz avatar Dec 24 '23 14:12 yoavkatz