lighteval icon indicating copy to clipboard operation
lighteval copied to clipboard

[EVAL] Big-Bench Extra Hard (BBEH)

Open lewtun opened this issue 9 months ago • 2 comments

Evaluation short description

Google has releases BBEH as a way to compensate for the saturation of BBH in the latest generation of LLMs. Overall looks like a good benchmark to probe reasoning capabilities.

Evaluation metadata

Provide all available

  • Paper url: https://arxiv.org/pdf/2502.19187
  • Github url: https://github.com/google-deepmind/bbeh
  • Dataset url:

lewtun avatar Mar 03 '25 15:03 lewtun

I'd like to implement this benchmark, if it's still up! Also, I found this unofficial hub upload of the dataset: https://huggingface.co/datasets/BBEH/bbeh . Since there's no official upload can we use this one, or would it be better to create our own upload similar to the original BBH: https://huggingface.co/datasets/lighteval/bbh ?

itsmejul avatar Jul 04 '25 17:07 itsmejul

@NathanHB is this still relevant? I would be happy to work on this and add it to lighteval

jgyasu avatar Nov 25 '25 09:11 jgyasu