unitxt
unitxt copied to clipboard
quintd1(owid) dataset
Hi, We have a judgement metric that is not rating based but a descriptive one (we want the judge to describe the errors in the prediction). There is no existing task for that. What do you suggest? @elronbandel
what will be the output of the judge? how will it be aggregated for a final score per the dataset?
The output will be a json containing the evidence and types of the errors in the prediction. The aggregation may be the number of errors for example.