test issues

Results 14 test issues

Sort by recently updated

Update README.md with papers with code mirror

We have a leaderboard here https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu. Feel free to merge this PR if you think this would be useful to link to!

RJT1990

Duplicate Answers in Validation Set

I have spotted (or more precisely, my schema validator has spotted) three questions where the choices feature a duplicate. All are in the 'validation' set. Specifically: In elementary_mathematics_val.csv ``` What...

riedgar-ms

can not download

I can't connect to the link to download the dataset

fourfireM

Hrvatski

Deni7s

Incorrect answer for Q5 in high_school_computer_science_dev.csv

The answer of the last question in `high_school_computer_science_dev.csv` is incorrect: ``` A list of numbers has n elements, indexed from 1 to n. The following algorithm is intended to display...

TenType

Seems that setting logprobs=100 is not useful now.

Hello authors, I am really impressed with your efforts in creating this benchmark! One small thing I notice is that OpenAI seems to limit the 'logprobs' argument to at most...

KL4805

Evaluation script for Huggingface Causal models

It was made using the existing [FLAN eval script](https://github.com/hendrycks/test/blob/master/evaluate_flan.py) as a reference. Minor changes: - load models as Float16; - put the samples on the same device as a model;...

ollmer

Answers A, B, C, D are not all equally likely - is it really accurate to use random baseline as comparison?

I pulled the test data linked in the README, and I am noticing within each category there is basically never an even 25% split between A, B, C, and D.....

bmosaicml

Dismatch dataset categories

In the paper, anatomy was categorised into STEM while in the `categories.py` file, anatomy is categorised into "health" and then "other". Which one is wrong here?

zhichengg

Fetching encoder json and bpe does not work (fixed by removing a typo)

Hello, In trying to use your code, I ran into the error with fetching encoder.json and encoder.bpe. I resolved the error by removing a whitespace in crop.py line 15 (there...

kovacgrgur

test
test copied to clipboard

Metadata

Update README.md with papers with code mirror

Duplicate Answers in Validation Set

can not download

Hrvatski

Incorrect answer for Q5 in high_school_computer_science_dev.csv

Seems that setting logprobs=100 is not useful now.

Evaluation script for Huggingface Causal models

Answers A, B, C, D are not all equally likely - is it really accurate to use random baseline as comparison?

Dismatch dataset categories

Fetching encoder json and bpe does not work (fixed by removing a typo)

← Metadata

Owner

Metadata

test test copied to clipboard

Metadata

← Metadata

Owner

Metadata

test
test copied to clipboard