evaluate Cannot use f1/recall/precision arguments in `CombinedEvaluations.compute`

Cannot use f1/recall/precision arguments in `CombinedEvaluations.compute`

Open fcakyon opened this issue 2 years ago • 3 comments

This works:

metric=evaluate.load('f1')
metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], average=None)

This won't work:

metric=evaluate.combine(["f1"])
metric.compute(references=[0, 1, 0, 1, 0], predictions=[0, 0, 1, 1, 0], average=None)

Reason: average is not included in f1 score features: https://github.com/huggingface/evaluate/blob/eaf34a7d04e7ab3e6155a046f6d7fda01d9ead84/metrics/f1/f1.py#L112

and CombinedEvaluations.compute ignore if argument is not included in features: https://github.com/huggingface/evaluate/blob/eaf34a7d04e7ab3e6155a046f6d7fda01d9ead84/src/evaluate/module.py#L858

Is this expected or a bug?

Aug 05 '22 20:08 fcakyon

@lvwerra do you have any opinion on that?

Aug 11 '22 06:08 fcakyon

Yes, this is a current limitation of combine: you can't pass any settings to compute only the features. Rather than fixing this in combine we aim to enable changing the settings when the metric is loaded in #169. This should be coming in the next few weeks.

Aug 15 '22 12:08 lvwerra

@lvwerra Do you mind if I ask whether you have an estimate on when this issue will be closed? I reviewed #188 and noticed that it seems to be passing for pip release. Is there a chance we can git it in the next couple of days?

Aug 26 '22 22:08 m-movahhedinia

@lvwerra I also need to pass custom kwargs to my metrics. My current workaround is to overwrite CombinedEvaluations .compute in a child class and remove https://github.com/huggingface/evaluate/blob/eaf34a7d04e7ab3e6155a046f6d7fda01d9ead84/src/evaluate/module.py#L858

This works in my specific use case because all my (custom) metrics accept **kwargs in their compute method, which means they will just ignore extra args.

Oct 23 '22 12:10 falcaopetri

Has this been resolved?

Nov 16 '22 11:11 ernestchu

Not yet. There is an issue with the sync mechanism between the hub and the library which is why we had to roll back #169. Merging it will break pre 0.3.0 installs so we need to wait for sufficient adaptation.

Nov 17 '22 11:11 lvwerra

evaluate evaluate copied to clipboard

Cannot use f1/recall/precision arguments in `CombinedEvaluations.compute`

evaluate
evaluate copied to clipboard