lm-evaluation-harness
lm-evaluation-harness copied to clipboard
RACE dataset?
Hi,
I have a quick question about the RACE dataset. It appears that the code is evaluating on the EleutherAI/RACE dataset, which contains approximately 1000 examples. However, the original RACE dataset consists of two subsets: "high" and "middle". Each has about 4k examples. I noticed that in the dataset card this is the test set for "high" subset. Can you explain why there is a mismatch here? Thanks!