instruct-eval
instruct-eval copied to clipboard
What are the metrics for the evaluation results?
Accuracy? Exact match? F1-score?
I cannot find the description in the paper: