test
test copied to clipboard
Measuring Massive Multitask Language Understanding | ICLR 2021
We have a leaderboard here https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu. Feel free to merge this PR if you think this would be useful to link to!
I have spotted (or more precisely, my schema validator has spotted) three questions where the choices feature a duplicate. All are in the 'validation' set. Specifically: In elementary_mathematics_val.csv ``` What...
I can't connect to the link to download the dataset
The answer of the last question in `high_school_computer_science_dev.csv` is incorrect: ``` A list of numbers has n elements, indexed from 1 to n. The following algorithm is intended to display...
Hello authors, I am really impressed with your efforts in creating this benchmark! One small thing I notice is that OpenAI seems to limit the 'logprobs' argument to at most...
It was made using the existing [FLAN eval script](https://github.com/hendrycks/test/blob/master/evaluate_flan.py) as a reference. Minor changes: - load models as Float16; - put the samples on the same device as a model;...
I pulled the test data linked in the README, and I am noticing within each category there is basically never an even 25% split between A, B, C, and D.....
In the paper, anatomy was categorised into STEM while in the `categories.py` file, anatomy is categorised into "health" and then "other". Which one is wrong here?
Hello, In trying to use your code, I ran into the error with fetching encoder.json and encoder.bpe. I resolved the error by removing a whitespace in crop.py line 15 (there...