unitxt Explicitly state when to compute confidence intervals

Today confidence intervals are computed by default for the main_score. This PR adds the capability of computing confidence intervals for additional scores.

We would like to change the confidence interval default, such that is is not computed by default, but rather it is only computed when explicitly stated in the metric.

Dec 27 '23 11:12 matanor

Today we have a mechanism for disabling confidence interval calculation, by setting n_resamples to None. That mechanism is used as the implementation of a command line parameter in FM-Eval.

There is also a mechanism for specifying a list of confidence interval scores, on which the confidence intervals are computed. This is implemented for instance metrics.

The suggestion is that the enable/disable mechanism of the confidence interval computation will be implemented only with the list of score names, with an empty list to indicate no computation. The n_resamples flag will no longer support a value of None.

Dec 27 '23 11:12 matanor

for which metrics CI is disabled? and why?

Dec 30 '23 20:12 assaftibm

I can see why latency can become an issue, but this is the case only for global metrics. For instance metrics, the CI computation should be very fast.

Dec 30 '23 20:12 assaftibm

for which metrics CI is disabled? and why?

Is was disabled for the default version of rouge (here). The reason is runtime. There are other cases of users that asked to turn it off. I think indeed the runtime issue is mainly for global metrics.

Dec 31 '23 06:12 matanor