evaluation
evaluation copied to clipboard
Code and Data for Evaluation WG
My attempt to add the ANLI dataset (issue #32), including: - Load ANLI and reformat each of the three validation splits (R1, R2, R3) into the prompt provided by the...
1. Evaluated on GPT2 2. Time taken: 3:40:59 on GTX 1080 Ti Other comments: 1. Prompt template used is the same as XQUAD/PIAF, with minor addition of the question "is...
(per question raised about [slide 6](https://docs.google.com/presentation/d/1LLWFR5AElafxDK4zu4pFdw8-Rz-UGvemG6xcu2uICjE/edit?usp=sharing) at the evaluation meeting on 9/1).
A simple proposal of using promptsource directly such that we don't have to implement it from scratch.
This might sounds like a bit of re-structuring but for the sake of future compatibility, I propose the following, 1. Move to `huggingface` trainer: This will help the repo to...
Coordinate with Meg Mitchell about this