Self Checks

[x] This is only for bug report, if you would like to ask a question, please head to Discussions.
[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
[x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.4.0

Cloud or Self Hosted

Cloud

Steps to reproduce

Create a conversation that requires a lengthy response (as shown in the screenshot: "Detailed introduction to deep neural networks")
Send the request and observe the response process

✔️ Expected Behavior

As the LLM generates content, it should stream the generated content to the chat interface in real-time, allowing users to read the already generated portions while waiting for the complete response.

❌ Actual Behavior

Users need to wait for the entire LLM processing (nearly 50 seconds) to complete before seeing the full response all at once. There are no updates to the chat interface during the waiting period.

Additional Information

From the screenshot, it's visible that the LLM node has completed processing (marked with a green checkmark), indicating that the backend has successfully generated all content, but there's an issue with the frontend display mechanism.

I hope the development team can prioritize fixing this issue, as streaming responses are crucial for improving user experience, especially when generating lengthy content.

May 26 '25 06:05 Yingjie-Zhao

Based on your description and the research into Dify version 1.4.0, here’s what’s known about streaming LLM responses in the chat interface:

Streaming Implementation: Dify’s chat interface is designed to support real-time streaming of LLM responses using a ChunkChatCompletionResponse. This mechanism sends data in chunks with the Content-Type: text/event-stream, and each chunk starts with data: and is separated by two newlines. The frontend should update as each chunk arrives, allowing users to see the response as it is generated rather than waiting for the entire output to be ready [source].
Known Issues and Fixes: There was a previous issue where streaming mode did not work as expected with some LLM models, but this was fixed in version 1.4.0 [source]. Another resolved issue involved ensuring that streaming content starts with data: [source].
Potential Causes for Your Issue:
- Since the backend is completing successfully (green checkmark), the problem is likely on the frontend, possibly with how it handles or renders the incoming stream.
- If the frontend is not processing the text/event-stream correctly, or if the connection is being buffered (by a proxy, CDN, or browser), the user may see the entire response only after completion.
- There could also be a regression or a configuration issue in the cloud deployment affecting streaming delivery to the browser.
Troubleshooting Steps:
1. Confirm that the API response for your chat request is using response_mode: streaming and that the network tab in your browser shows a text/event-stream response arriving in chunks.
2. Check for any frontend errors in the browser console that might indicate issues with rendering or handling the stream.
3. If you are behind a proxy or CDN, ensure it is configured to support HTTP streaming and does not buffer the response.
4. Review the frontend code or deployment logs for any recent changes that might affect streaming.
References for Implementation Details:
- Streaming response structure and events
- API documentation for streaming mode

If you continue to experience this issue on the cloud version and the above checks do not reveal the cause, it may be a regression or a deployment-specific problem. There are no open issues currently tracking this exact frontend streaming problem in 1.4.0, so your report is valuable for further investigation.

To continue talking to Dosu, mention @dosu-bot.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

May 26 '25 06:05 dosubot[bot]

Further Evidence of Non-Streaming Behavior

After further investigation of the API network calls, I've discovered that this streaming issue is not just limited to the frontend display, but also exists in the backend API response behavior.

As shown in the attached screenshots, the LLM node started execution at 14:30:04.573, but the first "message" event was only sent at 14:30:45.103 - more than 40 seconds later. All subsequent "message" events were then delivered in rapid succession within milliseconds of each other (14:30:45.111, 14:30:45.116, 14:30:45.121, etc.), with the final "message_end" event occurring at 14:30:49.492.

This timing pattern clearly indicates that despite using the text/event-stream format (which should support proper streaming), the LLM-generated content is essentially being accumulated on the backend and then sent all at once near the end of processing, rather than being streamed in real-time as it's generated.

This explains why users experience a long wait with no visible response, followed by the sudden appearance of the complete answer. For a true streaming experience, we would expect to see message events distributed more evenly throughout the entire 5-second processing period, not clustered in the final second.

This issue significantly diminishes the user experience advantages that streaming responses are designed to provide, particularly for longer generations that take substantial time to complete.

May 26 '25 06:05 Yingjie-Zhao

You can refer to this #19324

May 28 '25 02:05 wbext

You can refer to this #19324

The quoted issue talked about self-hosted version where nginx configuration might solve the problem. However, here is the official cloud SaaS product where we cannot alter Dify's own server settings.

Edit: And yes, this is an issue that affects my deployments too. Was hoping to get eager start with SSE token by token in the last LLM node, but alas it behaves like a blocking call.

May 28 '25 03:05 tipani86

@Yingjie-Zhao this issue https://github.com/langgenius/dify/issues/19891 might offer a workaround, whereby you "package" complex behavior that includes parallelism as its own workflow (think of it as a "function" in traditional programming), and then call this workflow within your chat flow.

But this solution feels a bit temporary hack, and I trust the Dify team can fix the issue properly.

May 28 '25 03:05 tipani86

你可以参考这个 #19324

The Dify team has supported these modifications in v1.4.0, but I still can't use streaming output currently.

May 28 '25 07:05 Reverse-Flash-Kamen

@Yingjie-Zhao this issue #19891 might offer a workaround, whereby you "package" complex behavior that includes parallelism as its own workflow (think of it as a "function" in traditional programming), and then call this workflow within your chat flow.

But this solution feels a bit temporary hack, and I trust the Dify team can fix the issue properly.

I raised this issue (#19891), but recently I found that it couldn’t resolve all problems. Some truly complex tasks still can’t use streaming output at present.

May 28 '25 07:05 Reverse-Flash-Kamen

@Yingjie-Zhao this issue #19891 might offer a workaround, whereby you "package" complex behavior that includes parallelism as its own workflow (think of it as a "function" in traditional programming), and then call this workflow within your chat flow. But this solution feels a bit temporary hack, and I trust the Dify team can fix the issue properly.

I raised this issue (#19891), but recently I found that it couldn’t resolve all problems. Some truly complex tasks still can’t use streaming output at present.

Sorry! I tested this workaround again and it still had functionality. My new problem is with the Variable Aggregator node – if you want to use streaming output, you must not use this node in Chatflow.

May 28 '25 09:05 Reverse-Flash-Kamen

@tipani86 @wbext @Reverse-Flash-Kamen Thank you all for providing the valuable information and insights.

I'd like to clarify that the workflow used in this issue is extremely simple - it contains only one LLM node and one reply node. Therefore, it doesn't involve the complex loop scenarios mentioned in #19891 that could affect streaming response performance. Additionally, since I'm using the Cloud version of Dify, there are no Nginx configuration issues to debug or troubleshoot.

Furthermore, although this issue was reported using the Cloud version of Dify, I have actually observed the same phenomenon in the Self-hosted version as well. This leads me to believe that this issue is different from other previously reported streaming response issues.

In summary, I believe this appears to be an internal issue within Dify itself that requires attention from the Dify development team. If anyone has additional thoughts or information regarding this problem, please feel free to share them. Thank you again for the discussion and your continued engagement on this matter.

May 28 '25 09:05 Yingjie-Zhao

Maybe you used an API that does not support streaming output, as I know some APIs' default model is blocking.

May 28 '25 09:05 Reverse-Flash-Kamen

Maybe you used an API that does not support streaming output, as I know some APIs' default model is blocking.

As you can see the screenshot that I posted above, I have set the body of the request as "response_mode": "streaming".

May 30 '25 01:05 Yingjie-Zhao

I see that your request has the parameter "response_mode":"streaming". I think that this request just to "Dify". I haven't used Aliyun. However, I found in their API documentation that the default value of the streaming model is false.

May 30 '25 02:05 Reverse-Flash-Kamen

You are right that the streaming parameter of the models provided by Aliyun was set to False. However I can’t find any switchable configuration on Dify to turn it on/off.At the same time, I found out something new today, I rebuilt a self-hosted Dify of version v1.4.1 and use the same Aliyun Qwen model, it works as I expected with streaming responses. So I don’t think it’s has matter with Aliyun model configurations, and this issue was about cloud version of Dify.Thanks again for your kindly engagement.please feel free to share with us if you have any other insights.在 2025年5月30日，10:01，Reverse-Flash-Kamen @.***> 写道：Reverse-Flash-Kamen left a comment (langgenius/dify#20214) I see that your request has the parameter "response_mode":"streaming". I think that this request just to "Dify". I haven't used Aliyun. However, I found in their API documentation that the default value of the streaming model is false. _20250530100052.png (view on web)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

May 30 '25 12:05 Yingjie-Zhao

Has it been resolved now? My 1.4.1 still has this issue

Jun 11 '25 02:06 pepsile

Hi, @Yingjie-Zhao. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

You reported that in Dify cloud version 1.4.0, LLM responses do not stream incrementally but appear all at once after processing.
My investigation showed the backend API delays sending chunks until near completion, indicating an internal streaming issue in the cloud version.
This problem is distinct from previous streaming issues and is not affected by model or proxy settings.
Other users have noted similar issues and partial workarounds, but the problem remains unresolved as of version 1.4.1.

Next Steps:

Please let me know if this streaming issue is still relevant with the latest version of Dify by commenting on this issue.
If I do not hear back within 15 days, this issue will be automatically closed.

Thank you for your understanding and contribution!

Sep 01 '25 16:09 dosubot[bot]

I'm still facing this issue on the cloud version, is there any work-around?

Nov 26 '25 18:11 Censacrof

Hi, @Yingjie-Zhao. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

You reported that in Dify cloud version 1.4.0, LLM responses do not stream incrementally but appear all at once after processing.

My investigation showed the backend API delays sending chunks until near completion, indicating an internal streaming issue in the cloud version.

This problem is distinct from previous streaming issues and is not affected by model or proxy settings.

Other users have noted similar issues and partial workarounds, but the problem remains unresolved as of version 1.4.1.

Next Steps:

Please let me know if this streaming issue is still relevant with the latest version of Dify by commenting on this issue.

If I do not hear back within 15 days, this issue will be automatically closed.

Thank you for your understanding and contribution!

I'm still facing this issue on the cloud version, is there any work-around?

Sorry for the late reply, I haven't check the issue yet, however some other people had met the same problem so maybe this issue still exists? Could you please reopen this Github issue?

Nov 27 '25 10:11 Yingjie-Zhao

@crazywoola The user @Yingjie-Zhao has confirmed that the streaming issue in the cloud version is still present and has requested the issue to be reopened for further assistance. Could you please take a look?

Nov 27 '25 10:11 dosubot[bot]

Dify Chat Interface Not Displaying LLM Responses in a Streaming Manner

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Additional Information

Further Evidence of Non-Streaming Behavior