Question Classfier contains useless content
Self Checks
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
0.15.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Add a Question Classfier node, test it in the preview, and you will find irrelevant redundant data in PROCESS DATA:
{
"model_mode": "chat",
"prompts": [
{
"role": "system",
"text": "<NORMAL CONTENT>",
"files": []
},
===== Below is irrelevant content =======
{
"role": "user",
"text": "\n { \"input_text\": [\"I recently had a great experience with your company. The service was prompt and the staff was very friendly.\"],\n \"categories\": [{\"category_id\":\"f5660049-284f-41a7-b301-fd24176a711c\",\"category_name\":\"Customer Service\"},{\"category_id\":\"8d007d06-f2c9-4be5-8ff6-cd4381c13c60\",\"category_name\":\"Satisfaction\"},{\"category_id\":\"5fbbbb18-9843-466d-9b8e-b9bfbb9482c8\",\"category_name\":\"Sales\"},{\"category_id\":\"23623c75-7184-4a2e-8226-466c2e4631e4\",\"category_name\":\"Product\"}],\n \"classification_instructions\": [\"classify the text based on the feedback provided by customer\"]}\n",
"files": []
},
{
"role": "assistant",
"text": "\n```json\n {\"keywords\": [\"recently\", \"great experience\", \"company\", \"service\", \"prompt\", \"staff\", \"friendly\"],\n \"category_id\": \"f5660049-284f-41a7-b301-fd24176a711c\",\n \"category_name\": \"Customer Service\"}\n```\n",
"files": []
},
{
"role": "user",
"text": "\n {\"input_text\": [\"bad service, slow to bring the food\"],\n \"categories\": [{\"category_id\":\"80fb86a0-4454-4bf5-924c-f253fdd83c02\",\"category_name\":\"Food Quality\"},{\"category_id\":\"f6ff5bc3-aca0-4e4a-8627-e760d0aca78f\",\"category_name\":\"Experience\"},{\"category_id\":\"cc771f63-74e7-4c61-882e-3eda9d8ba5d7\",\"category_name\":\"Price\"}],\n \"classification_instructions\": []}\n",
"files": []
},
{
"role": "assistant",
"text": "\n```json\n {\"keywords\": [\"bad service\", \"slow\", \"food\", \"tip\", \"terrible\", \"waitresses\"],\n \"category_id\": \"f6ff5bc3-aca0-4e4a-8627-e760d0aca78f\",\n \"category_name\": \"Experience\"}\n```\n",
"files": []
},
===== Below is normal content =======
{
"role": "user",
"text": "\n '{\"input_text\": [\"30岁的宝宝该吃什么\"],',\n '\"categories\": [{\"category_id\": \"1\", \"category_name\": \"婴幼儿饮食\"}, {\"category_id\": \"2\", \"category_name\": \"成人饮食\"}, {\"category_id\": \"1739426736318\", \"category_name\": \"其他\"}], ',\n '\"classification_instructions\": [\"根据用户输入判断用户是想询问婴幼儿饮食问题还是成人饮食问题\"]}'\n",
"files": []
},
{
"role": "user",
"text": "30岁的宝宝该吃什么",
"files": []
}
],
"usage": {
"prompt_tokens": 755,
"prompt_unit_price": "0.0",
"prompt_price_unit": "0.0",
"prompt_price": "0.0",
"completion_tokens": 340,
"completion_unit_price": "0.0",
"completion_price_unit": "0.0",
"completion_price": "0.0",
"total_tokens": 1095,
"total_price": "0.0",
"currency": "USD",
"latency": 31.721683834999567
},
"finish_reason": "stop"
}
✔️ Expected Behavior
PROCESS DATA does not contain extraneous irrelevant data
❌ Actual Behavior
PROCESS DATA contain extraneous irrelevant data
@bighamx Hello there! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!
To address the issue of irrelevant redundant data in the PROCESS DATA when using the QuestionClassifierNode in Dify version 0.15.3, you can modify the process_data dictionary within the _run method of the QuestionClassifierNode class. Specifically, you can selectively include only the necessary keys and values in the process_data dictionary. For example, if you only need the model_mode and finish_reason, you can adjust the dictionary like this:
process_data = {
"model_mode": model_config.mode,
"finish_reason": finish_reason,
}
This modification will exclude the prompts and usage data from the process_data, thereby removing potentially irrelevant or redundant information. Adjust the dictionary according to your specific requirements for what constitutes "irrelevant redundant data" in your context [1].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other | Bug Report
https://github.com/langgenius/dify/blob/a3d3e30e3a884b549fbfe3386f2a81562fb81ba5/api/core/workflow/nodes/question_classifier/template_prompts.py#L17
These prompts are used to instruct the model to output in the required format, ensuring that the Question Classifier works properly.
I have reservations about the useless of the example, as it could potentially mislead not smart enough models. Additionally, using a shorter category_id might be better.
I would like to understand why input_text is already written in QUESTION_CLASSIFIER_USER_PROMPT_3, yet it needs to be repeated again.
冗余