dify Call the API 'see' method, but LLM often returns data in one go instead of gradually returning it in a streaming manner

Self Checks

[x] This is only for bug report, if you would like to ask a question, please head to Discussions.
[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
[x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.2.0和1.3.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Call the API see method, but LLM often returns data in one go instead of gradually returning it as a stream. Traffic in chatflow is relatively simple, and all returns are LLM, which basically does not appear. However, if there is returned data in the process, use direct return instead of LLM. If there are more answers without LLM, even if LLM answers later, do not use see and directly return all content, including the content being considered

chat-flow

Directly return the result log

Debugging pages also frequently appear

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Apr 26 '25 12:04 liangyuan1

Hello @liangyuan1! I'm Dosu, a bot here to help you with bugs, questions, and becoming a contributor. I'm ready to assist you while we wait for a human maintainer. How can I help you today?

To ensure the 'see' method of the API returns data gradually in a streaming manner, you need to set the stream parameter to True when invoking the model. This approach is demonstrated in the integration tests for Dify, where setting stream=True ensures that the response is returned as a Generator, allowing you to process chunks of data as they are received ^[1].

Additionally, there is a similar unsolved discussion regarding non-streaming output from LLM models in Dify usage, which might be relevant to your issue. You can find it here ^[2].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

Apr 26 '25 12:04 dosubot[bot]

what is 'see' in api? I found the same promblem. when chatflow is simple, output is streaming; when chat flow has many nodes, output is blocking

Apr 27 '25 10:04 Gaocarri

what is 'see' in api? I found the same promblem. when chatflow is simple, output is streaming; when chat flow has many nodes, output is blocking I entered incorrectly, it should be sse. How to solve the blockage caused by too many processes. Do you know? I want the final input data for the big language model to be in streaming format

Apr 28 '25 02:04 liangyuan1

Could you provide the complete DSL file or a screenshot? I want to verify whether #19552 has fixed this issue.

May 13 '25 07:05 Nov1c444

Hi, @liangyuan1. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

You reported that in versions 1.2.0 and 1.3.0 (self-hosted via Docker), the API's "sse" method often returns the full LLM response at once instead of streaming it gradually.
This behavior is especially noticeable in chat flows with multiple nodes or mixed response types.
Suggestions were made to verify the stream parameter is set to True, with references to integration tests and related discussions.
Another user confirmed similar streaming issues in complex chat flows, while simple flows stream correctly.
A request was made for the DSL file to verify if a recent fix (#19552) resolves the problem, but no further updates have been provided.

Next Steps:

Please let me know if this issue is still relevant with the latest version of Dify by commenting here to keep the discussion open.
Otherwise, I will automatically close this issue in 15 days.

Thank you for your understanding and contribution!

Aug 28 '25 16:08 dosubot[bot]