llama-cookbook icon indicating copy to clipboard operation
llama-cookbook copied to clipboard

Is the model sensitive to prompts?

Open mengmeng233 opened this issue 2 years ago • 3 comments

🚀 The feature, motivation and pitch

Is the model sensitive to prompts? I loaded my own local dataset following the Samsum dataset and trained it using the 7B-chat model with FSDP. I noticed that the choice of different prompt designs seems to have a significant impact on the results.

Or can I interpret it as the model's performance varying significantly with different instructions?

Alternatives

No response

Additional context

No response

mengmeng233 avatar Oct 23 '23 06:10 mengmeng233

Hi @mengmeng233, could you provide some example prompts and responses that you've seen and the performance difference that you observe? Also, have you seen this with the base model as well?

albertodepaola avatar Oct 25 '23 16:10 albertodepaola

Hi @mengmeng233, could you provide some example prompts and responses that you've seen and the performance difference that you observe? Also, have you seen this with the base model as well?

I'm sorry for the delay in my response. I have conducted experiments on different task datasets. I used chat models for all these experiments. However, whether it's the 7b or 13b(LORA) model, it seems to be quite sensitive to the template for the task. For example, in a question-answering task, if I use the prompt(1) as follows:

"Please carefully read the following text. Text: {{contexts}} Instruction: Based on the provided text, please make judgments as accurately as possible. Ensure that your answers are strictly factual and avoid any potentially misleading information or statements. You are only allowed to respond with 'yes,' 'no,' or 'maybe.' Question: {{question}} ---\Response: {{decision}}{{eos_token}}"

And I use the prompt(2) as follows:

"Please read the following text. Text: {{contexts}} Instruction: Please read the provided text carefully, make reliable judgments, and ensure that your answers are fact-based. While doing so, be aware that the answer may not be straightforward, and the model should consider the complexity of the issue. Avoid any potentially misleading information or statements when responding to the question. Only permitted responses are 'yes,' 'no,' or 'maybe.' Question: {{question}} ---\Response: {{decision}}{{eos_token}}"

The difference in loss on the validation set is approximately 0.04. I have set the temperature and top_p to 0.02 during inference, so the generated content doesn't differ significantly. However, the results generated using these two prompts have a difference of around 0.2 (evaluated with the accuracy metric), and the second prompt performs better. I encountered a similar issue in NER tasks.

Currently, I haven't conducted experiments on the base model yet. I have around 500 data samples for my question-answering experiments and around 5000 data samples for my NER task experiments.

I'd like to know if anyone has encountered the same issue as me.

mengmeng233 avatar Nov 01 '23 03:11 mengmeng233

In the above prompts, I omitted "[INST]" and "[/INST]," but I correctly inserted them into the prompts when conducting experiments.

mengmeng233 avatar Nov 01 '23 03:11 mengmeng233

Closing but please re-open if you try the same with 3.1 model(s) and run into any issues!

init27 avatar Aug 19 '24 17:08 init27