prompt2model
prompt2model copied to clipboard
Benchmarking of prompt2model on composite benchmarks
Currently, we have benchmarked prompt2model extensively on three tasks (as detailed in our preprint).
But it would be much cooler if we could benchmark it on a bunch of tasks included in composite benchmarks. Examples of this include:
In order to do this, we'll have to
- [ ] create code that interfaces with each of these benchmarks
- [ ] run experiments
- [ ] probably finish https://github.com/neulab/prompt2model/issues/285