LLM JSON Output Incorrectly Extracts Data from "<think>" Label
Self Checks
- [x] I have read the Contributing Guide and Language Policy.
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report, otherwise it will be closed.
- [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
1.9.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Although I use structured output, when the data structure appears within the "
✔️ Expected Behavior
Should extract the formatted output out of the label of
❌ Actual Behavior
No response
Hi @Mister-leo! I'm Dosu and I’m helping the dify team.
This is a known bug in Dify v1.9.2: when JSON structured output appears inside <think> tags, the parser may incorrectly extract or fail to extract the intended output. Recent updates have addressed this by stripping out <think>...</think> content before parsing JSON, ensuring only the correct structured output is extracted. The fix was merged after your version—see PR #26992 for details and code changes.
If you need a workaround, you can manually patch your deployment to remove <think> tags from the LLM output before JSON parsing, similar to the regex used in the PR:
import re
if "<think>" in result_text:
result_text = re.sub(r"<think[^>]*>[\s\S]*?</think>", "", result_text, flags=re.IGNORECASE)
Upgrading to the latest release will include this fix and improve structured output extraction reliability.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Same problem, I use 1.8.1 version now, some times reasoning model will repeat the json schema in thinking part then output the final json in answer, but dify extract the json from reasoning but not response.
Hi, @Mister-leo. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.
Issue Summary:
- You reported that in Dify v1.9.2, the LLM incorrectly extracted structured output from inside the "
" tags instead of outside. - This was a known bug affecting versions around v1.8.1 and v1.9.2, causing JSON extraction from reasoning rather than the final answer.
- The issue was fixed after v1.9.2 by stripping out "
... " content before JSON parsing (see PR #26992). - A manual regex patch was suggested as a temporary workaround before upgrading.
- Upgrading to the latest release resolves this extraction problem.
Next Steps:
- Please confirm if this issue is still relevant with the latest version of Dify; if so, you can keep the discussion open by commenting here.
- Otherwise, I will automatically close this issue in 15 days.
Thanks for your understanding and contribution!