unitxt issues

MultiReferenceTemplate fails to apply output_format.

In templates.py, MultiReferenceTemplate derives from InputOutputTemplate. But MultiReferenceTemplate.outputs_to_target_and_references() doesn't call the base class to render the target using `output_format`. As a result, you just get the target mirrored when running...

jlqibm

Modify the BertScore metric report to include the metric name

Currently, when running bertScore, the report is on [“f1”, “precision”, “recall”]. This might be not very clear when expecting BertScore to appear in the metric name. Especially, when they use...

gitMichal

matthews_correlation returning 0 on perfect correlation

11

Why is this the accepted behavior (strict=False was set a long time ago)? ******************************************************************************** The results of running the main metric in used in the card (matthews_correlation) over simulated predictions...

yoavkatz

rougeL returns 0 score on perfect prediction in some languages

1

Change xlsum.py to run on all languages (remove if `lang == langs[0]:`) Run `python prepare/cards/xlsum.py` Traceback (most recent call last): File "/home/runner/work/unitxt/unitxt/tests/test_preperation.py", line 47, in test_preprations import_module_from_file(file) File "/home/runner/work/unitxt/unitxt/tests/test_preperation.py", line...

yoavkatz

MetricPipeline in not limited in n_resamples in tests

This is the root cause: https://github.com/IBM/unitxt/blob/main/src/unitxt/test_utils/metrics.py#L75 I think we need to test if the inner metric in MetricPipeline is GlobalMetri object and if so, set the n_resamples=3 to it.

yoavkatz

Explicitly state when to compute confidence intervals

4

Today confidence intervals are computed by default for the main_score. This [PR](https://github.com/IBM/unitxt/pull/431) adds the capability of computing confidence intervals for additional scores. We would like to change the confidence interval...

matanor

Remaining issues with additional datasets

2

There are few open issues: There is no multi_label template (fix required to unfair_tos and reuters) Can I use text_type : argument? I wonder if dbpedia_14 is of type text...

yoavkatz

Bug: Dataclass not overriding properties in inheritance

1

see: https://github.com/IBM/unitxt/pull/403

elronbandel

bug

Increase n_resamples for GlobalMetric in testing so confidence intervals are not NaN

3

@eladven @matanor In test_utils/metrics.py/test_metric, for a GlobalMetric we have ```` if isinstance(metric, GlobalMetric) and metric.n_resamples: metric.n_resamples3 # Use a low number of resamples in testing for GlobalMetric, to save runtime...

sam-data-guy-iam

Consider need for requirements for operator caches

Loaders need it, do other operators? https://github.com/IBM/unitxt/pull/339

yoavkatz

unitxt
unitxt copied to clipboard

Metadata

MultiReferenceTemplate fails to apply output_format.

Modify the BertScore metric report to include the metric name

matthews_correlation returning 0 on perfect correlation

rougeL returns 0 score on perfect prediction in some languages

MetricPipeline in not limited in n_resamples in tests

Explicitly state when to compute confidence intervals

Remaining issues with additional datasets

Bug: Dataclass not overriding properties in inheritance

Increase n_resamples for GlobalMetric in testing so confidence intervals are not NaN

Consider need for requirements for operator caches

← Metadata

Owner

Metadata

unitxt unitxt copied to clipboard

Metadata

← Metadata

Owner

Metadata

unitxt
unitxt copied to clipboard