dify Dataset returns incorrect answer

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.5.10

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

suppose I have a dataset with one line: The password of our guest wifi is 88886666

and I added this dataset as a document, created an app which uses this dataset. Now when I ask my agent using Google Gemini:

guest wifi password

it returns

ok got it

I believe this is due to incorrect prompt message order. Currently, the dataset info will be appended to the end of the prompt_messages list, so the final promp is something like this:

0: {system general prompt} 1: guest wifi password 2.I am thinking about how to help you{"action": "dataset-xxx"} 3. The password of our guest wifi is 88886666

If I manually change this order into:

0: {system general prompt} 1.I am thinking about how to help you{"action": "dataset-xxx"} 2. The password of our guest wifi is 88886666 3: guest wifi password

Gemini returns the correct answer.

I believe the user input message should always at the end.

And the info in dataset should play a role of 'assistant' instead of 'user' maybe? I have tested that if I change the role to assistant, the awnser from LLM is more consistent

✔️ Expected Behavior

returns the correct wifi password in my dataset

❌ Actual Behavior

returns 'ok I got it' as if I'm telling LLM some infomation instead of asking a question

Mar 26 '24 09:03 lancemao

Based on your description, it seems like you've identified the root cause of the issue correctly. The order of the system_prompt_orders array in your configuration file indeed determines the order in which the prompts are processed. If the user input message (the query_prompt) is not at the end, it can lead to the issue you're experiencing where the dataset info is acting as 'user' instead of 'assistant'.

To resolve this, you should adjust the system_prompt_orders in your configuration file to ensure that the user input (query) is processed last. This can be done by placing the query_prompt at the end of the system_prompt_orders array. Here's how you can modify your configuration:

{
  "context_prompt": "用户在与一个客观的助手对话。助手会尊重找到的材料，给出全面专业的解释，但不会过度演绎。同时回答中不会暴露引用的材料：\n\n```\n{{context}}\n```\n",
  "system_prompt_orders": [
    "context_prompt",
    "pre_prompt",
    "query_prompt"
  ],
  "query_prompt": "{{query}}",
  "stops": null
}

By ensuring the query_prompt is the last element in the system_prompt_orders array, you make sure that the user input is processed at the end, which should resolve the issue of incorrect prompt message order and the dataset info acting incorrectly.

For more information on how the system_prompt_orders array works, you can refer to the api/core/prompt/generate_prompts/baichuan_completion.json and api/core/prompt/generate_prompts/baichuan_chat.json files in the Dify codebase.

Please try this solution and let me know if it resolves your issue or if you need further assistance.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Mar 26 '24 09:03 dosubot[bot]

you can try the expert mode to define your own prompt @lancemao

Mar 28 '24 09:03 JohnJyong