evaluation
evaluation copied to clipboard
Code and Data for Evaluation WG
Results
50
evaluation issues
Sort by
recently updated
recently updated
newest added
Russian/English (+shuffle/numbers challenge sets)
NLG
In-Progress
Spanish/German (including COVID challenge sets)
help wanted
NLG
In-Progress
with TURK/ASSET test sets (including bfp02+backtranslation challenge sets)
NLG
In-Progress
use to test generalization to unseen domain; maybe use FLEX?
few_shot
use to test generalization to unseen domain; maybe use FLEX?
few_shot