test icon indicating copy to clipboard operation
test copied to clipboard

Measuring Massive Multitask Language Understanding | ICLR 2021

Results 14 test issues
Sort by recently updated
recently updated
newest added

We have a leaderboard here https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu. Feel free to merge this PR if you think this would be useful to link to!

I have spotted (or more precisely, my schema validator has spotted) three questions where the choices feature a duplicate. All are in the 'validation' set. Specifically: In elementary_mathematics_val.csv ``` What...

I can't connect to the link to download the dataset

The answer of the last question in `high_school_computer_science_dev.csv` is incorrect: ``` A list of numbers has n elements, indexed from 1 to n. The following algorithm is intended to display...

Hello authors, I am really impressed with your efforts in creating this benchmark! One small thing I notice is that OpenAI seems to limit the 'logprobs' argument to at most...

It was made using the existing [FLAN eval script](https://github.com/hendrycks/test/blob/master/evaluate_flan.py) as a reference. Minor changes: - load models as Float16; - put the samples on the same device as a model;...

I pulled the test data linked in the README, and I am noticing within each category there is basically never an even 25% split between A, B, C, and D.....

In the paper, anatomy was categorised into STEM while in the `categories.py` file, anatomy is categorised into "health" and then "other". Which one is wrong here?

Hello, In trying to use your code, I ran into the error with fetching encoder.json and encoder.bpe. I resolved the error by removing a whitespace in crop.py line 15 (there...