Yoav Katz
Yoav Katz
This PR is an initial integration of the [Unitxt](https://github.com/IBM/unitxt) data processing and evaluation library into LM-eval harness. Unitxt supports a variety of NLP tasks, datasets, templates and metrics. The Unitxt...
Fixed YesNoTemplate used to assume 'class' field was a list. Diverse LabelSampler, assumed the choices are a list. However, in binary classification, the "choices" field is "class", which is a...
Today, If metrics, returns the same score (e.g. "f1") then the first metric score is returned, and the second metric is ignored. metrics ["metrics.bert_score.distilbert_base_uncased","metrics.bert_score.deberta_base_mnli"], It would be good to be...
For example, consider the empty format:  Which today is: format = SystemFormat( demo_format="{source}\\N{target_prefix}{target}\n\n", model_input_format="{system_prompt}\\N{instruction}\\N{demos}{source}\\N{target_prefix}", )
DiverseLabelsSampler is used to select few shot examples. 1) The name is not clear. It should be something DiverseDemosSampler. 2) DiverseLabelsSampler requires adding a choices field - which requires adding...
data_dir=“tuning-data-cleared/ceramic/mixtures_02.26.2024/code/dolphin_coder”, data_files={“train”: “train/part0.jsonl”, “test”: “val/part0.jsonl”}, This causes failures. Possible solution: 1. Raise an error if '/' in data_files, but allow fusing of datasource (see #707) 2. Handle '/' in paths....
that allows using output of other cards in new cards
Today, tasks fields have no types ``` FormTask( inputs=["text", "text_type", "class"], outputs={"class" , "label"} , metrics=[ "metrics.f1_micro_multi_label", "metrics.f1_macro_multi_label", "metrics.accuracy", ], ) ``` This makes it hard for the user to...
With the introductions of metrics that can send data to remote services - one needs a safe way to avoid accidentally sending propriety/confidential data to external services. In the common...
Today, unitxt uses a default seed (42) for all dataset. It's not actually possible to change the seed today. Changing the seed could effect the dataset significantly given random choices,...