dify icon indicating copy to clipboard operation
dify copied to clipboard

LLM JSON Output Incorrectly Extracts Data from "<think>" Label

Open Mister-leo opened this issue 2 months ago • 3 comments

Self Checks

  • [x] I have read the Contributing Guide and Language Policy.
  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report, otherwise it will be closed.
  • [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.9.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Although I use structured output, when the data structure appears within the "" label, the LLM incorrectly extracts data from inside the label instead of from outside it. Image

Image

✔️ Expected Behavior

Should extract the formatted output out of the label of

❌ Actual Behavior

No response

Mister-leo avatar Nov 09 '25 15:11 Mister-leo

Hi @Mister-leo! I'm Dosu and I’m helping the dify team.

This is a known bug in Dify v1.9.2: when JSON structured output appears inside <think> tags, the parser may incorrectly extract or fail to extract the intended output. Recent updates have addressed this by stripping out <think>...</think> content before parsing JSON, ensuring only the correct structured output is extracted. The fix was merged after your version—see PR #26992 for details and code changes.

If you need a workaround, you can manually patch your deployment to remove <think> tags from the LLM output before JSON parsing, similar to the regex used in the PR:

import re
if "<think>" in result_text:
    result_text = re.sub(r"<think[^>]*>[\s\S]*?</think>", "", result_text, flags=re.IGNORECASE)

Upgrading to the latest release will include this fix and improve structured output extraction reliability.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Chat with Dosu Join Discord Share on X

dosubot[bot] avatar Nov 09 '25 15:11 dosubot[bot]

Same problem, I use 1.8.1 version now, some times reasoning model will repeat the json schema in thinking part then output the final json in answer, but dify extract the json from reasoning but not response.

Jimmy-L99 avatar Nov 10 '25 01:11 Jimmy-L99

Hi, @Mister-leo. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

  • You reported that in Dify v1.9.2, the LLM incorrectly extracted structured output from inside the "" tags instead of outside.
  • This was a known bug affecting versions around v1.8.1 and v1.9.2, causing JSON extraction from reasoning rather than the final answer.
  • The issue was fixed after v1.9.2 by stripping out "..." content before JSON parsing (see PR #26992).
  • A manual regex patch was suggested as a temporary workaround before upgrading.
  • Upgrading to the latest release resolves this extraction problem.

Next Steps:

  • Please confirm if this issue is still relevant with the latest version of Dify; if so, you can keep the discussion open by commenting here.
  • Otherwise, I will automatically close this issue in 15 days.

Thanks for your understanding and contribution!

dosubot[bot] avatar Dec 13 '25 16:12 dosubot[bot]