MedQA icon indicating copy to clipboard operation
MedQA copied to clipboard

Human baseline

Open vlievin opened this issue 3 years ago • 5 comments

Hi! Is there a known human baseline for this dataset (open and closed book)? Or maybe the required score to pass the exam?

vlievin avatar Jun 01 '22 08:06 vlievin

Nope, but you can use 60 out f 100 score as a reference score, which is the passing score for the Med Exam.

jind11 avatar Jun 03 '22 18:06 jind11

Thank you for the info! So, just to make sure we are aligned here: does that means 60% answering accuracy for the US, TW and MC datasets?

vlievin avatar Jun 10 '22 10:06 vlievin

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

jind11 avatar Jun 10 '22 18:06 jind11

Yeap, 60% of accuracy can be considered as the human passing score. For US dataset, this score is quite hard.

Can you elaborate on your reasoning here please? It seems a somewhat flawed heuristic to assume human performance on this particular QA dataset [which to my understanding is generated?] would be approximately equivalent to the outcome of humans being tested in actual exams.

miraculixx avatar May 01 '24 08:05 miraculixx