unitxt issues

Fusion does not assign value to field "group" of every instance resulting in errors in metrics computation

Fusion classes were suppose to add field named "group" to every instance of the fusion streams, stating the name of its origin. In turn in metric computation time the metric...

elronbandel

bug

Demos sampling per instance is not consistent for shuffled stream

1

Currently the demos for every instance are sampled based on the inital seed and its order in the stream, instaed of based solely on the content of the instance. This...

elronbandel

bug

DiverseLabelsSamples clarification

DiverseLabelsSampler is used to select few shot examples. 1) The name is not clear. It should be something DiverseDemosSampler. 2) DiverseLabelsSampler requires adding a choices field - which requires adding...

yoavkatz

Metrics naming with overwrite args

When loading a metric with overwrite args (for example: `metrics.char_edit_distance[reference_field=original_text]`), this is not reflected in the result dict returned from the metric, which will have the original metric name. This...

arielge

LoadFromIBMCos does not work if datafile has '/' in name

data_dir=“tuning-data-cleared/ceramic/mixtures_02.26.2024/code/dolphin_coder”, data_files={“train”: “train/part0.jsonl”, “test”: “val/part0.jsonl”}, This causes failures. Possible solution: 1. Raise an error if '/' in data_files, but allow fusing of datasource (see #707) 2. Handle '/' in paths....

yoavkatz

Issue with the production of MMLU datasets

Bug in the MMLU dataset production process: ``` from unitxt.templates import MultipleChoiceTemplate # self.template = MultipleChoiceTemplate(type='multiple_choice_template', artifact_identifier='template_0', _requirements_list=[], caching=None, apply_to_streams=None, dont_apply_to_streams=None, skip_rendered_instance=True, postprocessors=['processors.first_character'], instruction='', target_prefix='', title_fields=[], input_format='Question: [question] Choices: [choices]...

eliyahabba

Disable nan transformation in loaders.LoadCSV

1

Currently, we use pd.read_csv() for loading csv file. It causes that empty cells in a csv file are transformed to nan (which is not suited) The fix is simple: pd.read_csv('test.csv',...

benjaminsznajder

Added initial implementation of Loader that loads existing cards

that allows using output of other cards in new cards

yoavkatz

Standard metrics

4

dafnapension

Add type checking for task definition.

Today, tasks fields have no types ``` FormTask( inputs=["text", "text_type", "class"], outputs={"class" , "label"} , metrics=[ "metrics.f1_micro_multi_label", "metrics.f1_macro_multi_label", "metrics.accuracy", ], ) ``` This makes it hard for the user to...

yoavkatz

unitxt
unitxt copied to clipboard

Metadata

Fusion does not assign value to field "group" of every instance resulting in errors in metrics computation

Demos sampling per instance is not consistent for shuffled stream

DiverseLabelsSamples clarification

Metrics naming with overwrite args

LoadFromIBMCos does not work if datafile has '/' in name

Issue with the production of MMLU datasets

Disable nan transformation in loaders.LoadCSV

Added initial implementation of Loader that loads existing cards

Standard metrics

Add type checking for task definition.

← Metadata

Owner

Metadata

unitxt unitxt copied to clipboard

Metadata

← Metadata

Owner

Metadata

unitxt
unitxt copied to clipboard