test
test copied to clipboard
Answers A, B, C, D are not all equally likely - is it really accurate to use random baseline as comparison?
I pulled the test data linked in the README, and I am noticing within each category there is basically never an even 25% split between A, B, C, and D..
The most imbalanced category is high school statistics, for which 47% of the answers are D.
I have two Qs: Is my analysis correct? I was using the test data downloadable from the main repo. Furthermore, if my analysis is correct wouldn't random baseline not be a fair comparison, since majority vote would do much better?
I used the data here: https://people.eecs.berkeley.edu/~hendrycks/data.tar