Added original template prompt to LAMA-TREx
Added a prompt to LAMA-TREx which is closer to the original LAMA template prompts.
See Jess's comments in #737 for more suggestions on prompts that sound more natural to an English speaker with no experience in NLP (i.e., write prompts as if you are talking to a college student who knows nothing about computer science, avoid prompts with technical jargon as if you’re talking to other NLP researchers.) For example, you wrote:
What is the missing word to fill the [MASK]?
How about "What could be a missing word at [MASK]?"
Also, you only included two original task prompts. We want at least 5 original task prompts. Thanks!
Sorry, was not aware of the min 5. requirement. I added 6 prompts and also adapted the other one according to your recommendation.
Marked the task as original task and added 5 prompts per task. However, I have some problems in solving the problem with the automatic testing.
Thanks! I rebased your branch to the latest eval-hackathon branch. The prompts themselves look good. Just some housekeeping questions left:
- Why are your “question prompts” not original task? Don't they also test for the same knowledge as the fill-in-the-MASK format?
- Since you're now just using
janck/bigscience-lamaas opposed tolama/trex, could you remove the prompts in the latter? That might fix the automatic tests. - Be aware that the question prompts do yield some ungrammatical sentences like the ones in the screenshots below. They're probably okay since the the questions are part of the dataset, and that this question format are still more natural-sounding than the fill-in-the-MASK format.
@awebson Thanks for rebasing.
- I can change them to the original task. The questions are not part of the original LAMA benchmark though. I created them by translating the original LAMA prompt templates to questions.
- I am not 100% sure what exactly you are proposing. Do you think we should delete the complete LAMA/TREx task? Or should I just delete the prompt from this PR?
- Since I created bigscience-lama, I will correct these question templates in the dataset, so that they are grammatically correct.
- Marking them as original task would be great. Thanks! Few prompts are part of their original datasets, as most datasets predate prompt-based models. The original task flag indicates if a prompt reflects what the dataset intends to measure, not the exact phrasing of instructions used by the creation of the datasets.
- We should remove the LAMA/TREx prompts from this PR, since I assume you will report the results solely from your bigscience-lama? (We cannot remove the LAMA/TREx dataset, which is maintained by the HF datasets team.)
- That'd be ideal. Thanks so much!
@JanKalo this has merge conflicts and build errors, can you merge in changes from main and see if that fixes the build errors?
@jzf2101 Thanks. Yes. I will take care of this today. I was a bit busy the previous days.
@jzf2101 Thanks. Yes. I will take care of this today. I was a bit busy the previous days.
Bump! Thanks!