test icon indicating copy to clipboard operation
test copied to clipboard

Answers A, B, C, D are not all equally likely - is it really accurate to use random baseline as comparison?

Open bmosaicml opened this issue 1 year ago • 0 comments

I pulled the test data linked in the README, and I am noticing within each category there is basically never an even 25% split between A, B, C, and D..

The most imbalanced category is high school statistics, for which 47% of the answers are D.

I have two Qs: Is my analysis correct? I was using the test data downloadable from the main repo. Furthermore, if my analysis is correct wouldn't random baseline not be a fair comparison, since majority vote would do much better?

I used the data here: https://people.eecs.berkeley.edu/~hendrycks/data.tar

bmosaicml avatar Apr 19 '23 19:04 bmosaicml