unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Explicitly state when to compute confidence intervals

Open matanor opened this issue 1 year ago • 4 comments

Today confidence intervals are computed by default for the main_score. This PR adds the capability of computing confidence intervals for additional scores.

We would like to change the confidence interval default, such that is is not computed by default, but rather it is only computed when explicitly stated in the metric.

matanor avatar Dec 27 '23 11:12 matanor

Today we have a mechanism for disabling confidence interval calculation, by setting n_resamples to None. That mechanism is used as the implementation of a command line parameter in FM-Eval.

There is also a mechanism for specifying a list of confidence interval scores, on which the confidence intervals are computed. This is implemented for instance metrics.

The suggestion is that the enable/disable mechanism of the confidence interval computation will be implemented only with the list of score names, with an empty list to indicate no computation. The n_resamples flag will no longer support a value of None.

matanor avatar Dec 27 '23 11:12 matanor

for which metrics CI is disabled? and why?

assaftibm avatar Dec 30 '23 20:12 assaftibm

I can see why latency can become an issue, but this is the case only for global metrics. For instance metrics, the CI computation should be very fast.

assaftibm avatar Dec 30 '23 20:12 assaftibm

for which metrics CI is disabled? and why?

Is was disabled for the default version of rouge (here). The reason is runtime. There are other cases of users that asked to turn it off. I think indeed the runtime issue is mainly for global metrics.

matanor avatar Dec 31 '23 06:12 matanor