Add BigBench Tasks for evaluation

Open Muhtasham opened this issue 3 years ago • 1 comments

Hi would be cool to valuate all openai models on Beyond the Imitation Game Benchmark (BIG-bench) which is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. The more than 200 tasks included in BIG-bench are summarized by keyword here, and by task name here. A paper introducing the benchmark, including evaluation results on large language models, is currently under review, and is available as a preprint.

Mar 15 '23 15:03 Muhtasham

I believe a significant part of BIG-bench requires logprobs, which our API doesn't support currently. However, feel free to open a PR to add BIG-bench evals!

Apr 13 '23 18:04 jwang47