prompt2model icon indicating copy to clipboard operation
prompt2model copied to clipboard

Benchmarking of prompt2model on composite benchmarks

Open neubig opened this issue 1 year ago • 0 comments

Currently, we have benchmarked prompt2model extensively on three tasks (as detailed in our preprint).

But it would be much cooler if we could benchmark it on a bunch of tasks included in composite benchmarks. Examples of this include:

  1. BIG-Bench
  2. OpenAI evals
  3. Eleuther Evaluation Harness

In order to do this, we'll have to

  • [ ] create code that interfaces with each of these benchmarks
  • [ ] run experiments
  • [ ] probably finish https://github.com/neulab/prompt2model/issues/285

neubig avatar Aug 25 '23 13:08 neubig