catwalk
catwalk copied to clipboard
Adding models/methods/datasets
Motivation: Various people have asked for various additions to Catwalk already. It's risky because nobody is using Catwalk yet. But we have several people who said they want to (Pradeep, Matt/Hamish, Iz?, Ludwig).
Here are the sub-projects in order of importance:
- Promptsource: This is the most requested one. The task would be to add a promptsource instance format to as many tasks as possible, and then evaluate various models with that format.
- Make sure we have all the tasks in P3. This might be a no-op after the first item.
- Few-shot prompting (for in-context learning). Nobody has explicitly asked for this. I think nobody asks because it's obvious that catwalk would have this.
- Crossfit: Pradeep wants to use Crossfit. Those tasks should be fairly easy to add.
- T-Few seems like a good baseline for a lot of our work, so it might become a good benchmark set for a while.
- BigBench: Pradeep asked about those too, but backed off from it later. Could be a nice addition, but is at the bottom of this list on purpose.
- Prompt format that uses the "channel method" for decoder-only models. Nobody has asked for it, but it came up in our reading group. I thought I could verify it with a quick experiment, but I could not.