dify icon indicating copy to clipboard operation
dify copied to clipboard

Memory leak under high load

Open VolXD opened this issue 2 weeks ago • 5 comments

Self Checks

  • [x] I have read the Contributing Guide and Language Policy.
  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report, otherwise it will be closed.
  • [x] 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

v1.10.1-fix.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Run three consecutive 100-thread tests after restarting Image

✔️ Expected Behavior

memory release

❌ Actual Behavior

Memory usage increases during testing and is not released.

VolXD avatar Dec 26 '25 01:12 VolXD

Hi @VolXD! I'm Dosu and I’m helping the dify team.

Based on the code and recent changes, the most likely cause of the memory leak under high load is unbounded event buffering in the EventManager component. Events are collected and stored in a list but are never explicitly cleared after processing, which can cause memory usage to grow and not be released, especially during high-concurrency tests. The dispatcher and workflow orchestration also coordinate event processing but do not clear the buffered events, so memory can accumulate if the workload is sustained or if events are retained indefinitely [EventManager source].

To confirm this, you can monitor the size of the event buffer during and after your tests, or use memory profiling tools (like tracemalloc or objgraph) to check for lingering event objects. Heap dumps after test completion can also help pinpoint which objects are not being released.

As a mitigation, consider restarting the service between heavy test runs, limiting concurrency, or patching the code to clear the event buffer after execution completes. If you can, try instrumenting the code to log the length of the _events list in EventManager before and after tests.

If you have logs, heap snapshots, or can share more details about which components are consuming the most memory, that would help narrow down the root cause further.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 26 '25 01:12 dosubot[bot]

same behavior in v1.8.1

VolXD avatar Dec 26 '25 02:12 VolXD

That confirms the memory leak is a longstanding issue and not specific to recent versions. The root cause is likely the unbounded event buffering in the EventManager, which collects events in a list but never clears it after processing—so memory usage keeps growing under high load and isn't released after tests are done (see code).

To help confirm this, you could instrument the code to log the length of the _events list before and after your tests, or use memory profilers like tracemalloc or objgraph to see if event objects are lingering. As a temporary workaround, restarting the service between heavy test runs or limiting concurrency may help until a code fix is available.

If you can share profiling results or stats on the event buffer, that could help maintainers prioritize a fix.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 26 '25 02:12 dosubot[bot]

docker inspect -f '{{.State.Pid}}'

sudo perf mem -p record -- sleep 30

fatelei avatar Dec 26 '25 02:12 fatelei

same behavior in v1.8.1

Please upgrade to the latest version.

Regarding the issue @VolXD , please provide more details if possible

crazywoola avatar Dec 26 '25 02:12 crazywoola