quintd1(owid) dataset

Open ShirApp opened this issue 1 year ago • 2 comments

Hi, We have a judgement metric that is not rating based but a descriptive one (we want the judge to describe the errors in the prediction). There is no existing task for that. What do you suggest? @elronbandel

Aug 26 '24 14:08 ShirApp

what will be the output of the judge? how will it be aggregated for a final score per the dataset?

Aug 26 '24 15:08 elronbandel

The output will be a json containing the evidence and types of the errors in the prediction. The aggregation may be the number of errors for example.

Aug 26 '24 15:08 ShirApp