ScandEval
ScandEval copied to clipboard
Truthfulness evaluation
This will add an orthogonal evaluation of decoder language models, testing how much they hallucinate.
This dataset might be relevant: https://openai.com/index/introducing-simpleqa/