Yoav Katz issues

Results 52 issues of


                                            Yoav Katz

Initial integration of the Unitxt to LM eval harness

This PR is an initial integration of the [Unitxt](https://github.com/IBM/unitxt) data processing and evaluation library into LM-eval harness. Unitxt supports a variety of NLP tasks, datasets, templates and metrics. The Unitxt...

Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing.

Fixed YesNoTemplate used to assume 'class' field was a list. Diverse LabelSampler, assumed the choices are a list. However, in binary classification, the "choices" field is "class", which is a...

Support disambiguation of scores from metrics that returns same score names

Today, If metrics, returns the same score (e.g. "f1") then the first metric score is returned, and the second metric is ignored. metrics ["metrics.bert_score.distilbert_base_uncased","metrics.bert_score.deberta_base_mnli"], It would be good to be...

Unitxt explore seems to be using an older version of unitxt

For example, consider the empty format: ![image](https://github.com/IBM/unitxt/assets/68273864/2a66647e-2264-4dbc-8db3-f6c7ff66ee7e) Which today is: format = SystemFormat( demo_format="{source}\\N{target_prefix}{target}\n\n", model_input_format="{system_prompt}\\N{instruction}\\N{demos}{source}\\N{target_prefix}", )

DiverseLabelsSamples clarification

DiverseLabelsSampler is used to select few shot examples. 1) The name is not clear. It should be something DiverseDemosSampler. 2) DiverseLabelsSampler requires adding a choices field - which requires adding...

LoadFromIBMCos does not work if datafile has '/' in name

data_dir=“tuning-data-cleared/ceramic/mixtures_02.26.2024/code/dolphin_coder”, data_files={“train”: “train/part0.jsonl”, “test”: “val/part0.jsonl”}, This causes failures. Possible solution: 1. Raise an error if '/' in data_files, but allow fusing of datasource (see #707) 2. Handle '/' in paths....

Added initial implementation of Loader that loads existing cards

that allows using output of other cards in new cards

Add type checking for task definition.

Today, tasks fields have no types ``` FormTask( inputs=["text", "text_type", "class"], outputs={"class" , "label"} , metrics=[ "metrics.f1_micro_multi_label", "metrics.f1_macro_multi_label", "metrics.accuracy", ], ) ``` This makes it hard for the user to...

Handling sensitive data sent to remote services

With the introductions of metrics that can send data to remote services - one needs a safe way to avoid accidentally sending propriety/confidential data to external services. In the common...

Seed control in unitxt

Today, unitxt uses a default seed (42) for all dataset. It's not actually possible to change the seed today. Changing the seed could effect the dataset significantly given random choices,...