Add GEM/ToTTo
Can you note what models you were able to run this with?
I tried to run this off your fork, but I wasn't able to load any prompts? Are they merged into the eval_hackathon branch on PS?
Can you note what models you were able to run this with?
I tried to run this off your fork, but I wasn't able to load any prompts? Are they merged into the eval_hackathon branch on PS?
Hi @cjlovering, I tried running it on GPT 2. No, the prompts are not yet merged into the eval_hackathon branch on PS.
For python -m main --device cpu --tasks gem_totto --num_fewshot 0 --limit 5 --model gpt2, the results were:
gpt2 (), limit: 5, provide_description: False, num_fewshot: 0, batch_size: None
| Task | Prompt |Version| Metric | Value | |Stderr|
|---------|---------------------------|------:|-------------------|------:|---|-----:|
|gem_totto|final_text_describing_table| 0|bleu |13.7120|± |7.6106|
|gem_totto|final_text_describing_table| |rouge1_precision | 0.1914|± |0.0706|
|gem_totto|final_text_describing_table| |rouge1_recall | 0.5547|± |0.1688|
|gem_totto|final_text_describing_table| |rouge1_fmeasure | 0.2827|± |0.0991|
|gem_totto|final_text_describing_table| |rouge2_precision | 0.1455|± |0.0721|
|gem_totto|final_text_describing_table| |rouge2_recall | 0.4259|± |0.1743|
|gem_totto|final_text_describing_table| |rouge2_fmeasure | 0.2151|± |0.1019|
|gem_totto|final_text_describing_table| |rougeL_precision | 0.1825|± |0.0695|
|gem_totto|final_text_describing_table| |rougeL_recall | 0.5261|± |0.1613|
|gem_totto|final_text_describing_table| |rougeL_fmeasure | 0.2691|± |0.0969|
|gem_totto|final_text_describing_table| |rougeLsum_precision| 0.1914|± |0.0706|
|gem_totto|final_text_describing_table| |rougeLsum_recall | 0.5547|± |0.1688|
|gem_totto|final_text_describing_table| |rougeLsum_fmeasure | 0.2827|± |0.0991|
|gem_totto|guess the table page title | 0|bleu | 0.2054|± |0.0348|
|gem_totto|guess the table page title | |rouge1_precision | 0.0097|± |0.0059|
|gem_totto|guess the table page title | |rouge1_recall | 0.1667|± |0.1054|
|gem_totto|guess the table page title | |rouge1_fmeasure | 0.0182|± |0.0112|
|gem_totto|guess the table page title | |rouge2_precision | 0.0000|± |0.0000|
|gem_totto|guess the table page title | |rouge2_recall | 0.0000|± |0.0000|
|gem_totto|guess the table page title | |rouge2_fmeasure | 0.0000|± |0.0000|
|gem_totto|guess the table page title | |rougeL_precision | 0.0097|± |0.0059|
|gem_totto|guess the table page title | |rougeL_recall | 0.1667|± |0.1054|
|gem_totto|guess the table page title | |rougeL_fmeasure | 0.0182|± |0.0112|
|gem_totto|guess the table page title | |rougeLsum_precision| 0.0097|± |0.0059|
|gem_totto|guess the table page title | |rougeLsum_recall | 0.1667|± |0.1054|
|gem_totto|guess the table page title | |rougeLsum_fmeasure | 0.0182|± |0.0112|
|gem_totto|guess the table webpage url| 0|bleu | 1.8425|± |0.7780|
|gem_totto|guess the table webpage url| |rouge1_precision | 0.0341|± |0.0122|
|gem_totto|guess the table webpage url| |rouge1_recall | 0.1893|± |0.0648|
|gem_totto|guess the table webpage url| |rouge1_fmeasure | 0.0577|± |0.0206|
|gem_totto|guess the table webpage url| |rouge2_precision | 0.0148|± |0.0099|
|gem_totto|guess the table webpage url| |rouge2_recall | 0.0905|± |0.0585|
|gem_totto|guess the table webpage url| |rouge2_fmeasure | 0.0254|± |0.0170|
|gem_totto|guess the table webpage url| |rougeL_precision | 0.0341|± |0.0122|
|gem_totto|guess the table webpage url| |rougeL_recall | 0.1893|± |0.0648|
|gem_totto|guess the table webpage url| |rougeL_fmeasure | 0.0577|± |0.0206|
|gem_totto|guess the table webpage url| |rougeLsum_precision| 0.0341|± |0.0122|
|gem_totto|guess the table webpage url| |rougeLsum_recall | 0.1893|± |0.0648|
|gem_totto|guess the table webpage url| |rougeLsum_fmeasure | 0.0577|± |0.0206|
Thanks to @jon-tow for helping me fix the issues I was facing.
Great -- its looking good! Let's wait on merging til the prompts get pulled in on the PS side.
@manandey How is this going? Is GEM/ToTTO merged into promptsource?
Hi @cjlovering, the PR was raised a long time back for this in promptsource, but the review process was a bit slow. Now, again changes have been suggested for the templates created. Hope the PR gets merged soon. Will keep you posted.