Memory leak within docker container and local machines
issue I'm not sure if this is a bug, but it seems there is not only one person encountering memory leak issue. The link above is the related post containing the problem description and lib version.
Here is what i found while debugging: I have been debugging this for several days, and i found this issue seems to be related to livekit ffi lib. I see when a user try connecting to a livekit room, the requests below will be sent to ffi lib sequentially:
connect -> create_audio_track -> publish_track -> set_subscribed -> new_audio_stream -> ...
and i found the memory leak seems(i'm not sure) to happen after set_subscribed.
I hope the information above can help locate and resolving this problem, or i have to restart my agent server every several hours đ
Updated: the memory leak also happens on my local machine (M1 MacOS 15.3.2) on different python versions(3.11.7 / 3.12.6 / 3.13.2), and this repo contains the dependencies and reproducing steps in details
Yes, even I've experienced memory leaks, haven't been able to point out to the exact location of the leak, but it look like the memory used is not released even after the call is dropped
could you describe the environment you are running on? what linux base image and what is the architecture?
are you able to reproduce with the standard basic_agent demo?
@davidzhao hi David, i have run many tests(about 20+ cases) with different python versions, and i found this issue also happened on my local machine(M1 MacOS 15.3.2), not only the docker container, but i still don't know under which circumstances this will happen. The link below is the memory leak demo repo, which contains the codes(the simplest use of LiveKit Agent) and some memory usage graphs from my tests, and also the dependencies and reproducing steps. I hope this can be helpful.
https://github.com/theo1893/livekit-memory-leak-demo
I have not run the standard basic_agent demo up till now because of the collection of records of repo above, and i will run the standard demo asap to see whether the memory leak will happen.
@theo1893 I was not able to reproduce the issue with your script following the steps (connect to the agent twice), I tested both 1.0.17 and 1.0.18 multiple times. The RAM was released after I muted the audio. It's on MacOS 15 M4, python 3.12
The only difference is I didn't create a room with your script, connect from the agent playground will automatically create the room and have the agent join. But I don't think it will cause a different result. Could you try the latest version and use the playground to create the room and see if you still have the issue?
1.0.17
1.0.18
An important detail to note is that the example above utilizes the THREAD-based JobExecutor. This behavior may not occur when using the default PROCESS-based executor.
https://github.com/theo1893/livekit-memory-leak-demo/blob/4accb4147516139532663bd7bc6839a5c2827d49/start_worker.py#L162
@longcw hhi master, as i have said, this leak cannot be reproduced under my control right now, but i still succeeded to reproduce this at about 2025-05-04 09:10 (UTC+8), and here is the result graph:
Here is my operation steps:
- Create a new conda env with python=3.12.6, with the requirements.txt
- Execute python start_worker.py
- Directly connect to the room via PlayGround, without executing create_room.py (because as you mentioned, the connecting will create a new room, and i do not need to create a room manually)
- Don't say anything, and watch the memory output on console for seconds.
- Disconnect.
- Repeat the Step3 to Step5 for 5 times. The memory usage is relatively stable during these 5 tests.
- Then at the 6th connecting-watching, the memory leak happened, as the graph above shows.
@theomonnom hihi master, this memory leak issue was first observed in our Process-Based agent server with dependencies as below:
On this server we got 2 points:
- For some reason the memory is always increasing, which is what we are discussing about here. I have upgraded the server to 1.X api version, but the leak still exists.
- The memory usage was exceptionally higher than my expectation when server starts, and i thought this was because we were using Process-Based executor which means each JobProcess will have an independent memory space. Then i changed this to Thread-Based executor, and then the memory usage of server staring become normal.
in my opinion the style of JobProcess only means different workload type (Process or Thread), and this may not be related to this issueđ¤
@theo1893 I was able to reproduce the issue after creating and closing the jobs multiple times. The issue is that the AudioStream was not closed properly after job exiting. It should be already fixed after the 1.0.18, can you try the latest version and see if that fixes the memory leak for you?
@longcw thank you for you help! i will try the latest api asap to see if this issue has been fixed đ
@longcw hihi master, after upgrading livekit-agents to 1.0.18, this memory increasing still exists after connections on my M1 Mac, but at the first several connections the memory usage is relatively stable.
You can use the PROCESS-based executor for now. I'll investigate it further.
@theo1893 can you try to reproduce it with any of the examples in the repo in dev mode and share the logs when the issue happens? I cannot reproduce it in the latest branch now.
@longcw sure. This is the log file at 2025-05-06 11:18(UTC+8) corresponding to the image i posted.
and here is the dependencies list related to livekit:
sorry I mean the debugging logs of the agent, maybe you can enable it in your script or use the examples in the agents repo.
On the Y axis, we have the memory usage in GB. And you can see that my bot in the docker container seems to run out of memory in minutes eventually leading to a pod restart. With max number of user's 4.
@longcw hihi master, sry for replying late. i have enabled the debug log in agent, and here is the log file:
and below is the corresponding graph:
although the memory usage is able to decrease on my M1 Mac, there still exists continuous memory increasing during connection(after i tried many times), at about 0.1 ~ 0.2MB per second. it seems to be abnormal.
From the log all the audio stream created were closed properly when the participant disconnected. So either it's a different issue or it's the memory used for some data during the agent responding. Can you run it for a longer time like a few hours to see what is the max RAM usage.
FYI, if you are blocked by this issue, the process executor shouldn't have the issue as it will close the process when the participant disconnected, all RAM will be released in that case.
@longcw Thank you master! i will try process executor after upgrading libs in our production instances!
@longcw hiďź master, I also have a memory leak, creating 4000 rooms each time (client uses Room objects to connect the room), and then exit, repeat this several times, and you can see that the memory is constantly growing every time, Is there still memory not released after connecting through the Room object?
environments: livekit==1.0.8 livekit_agents==1.0.22 python==3.11
4000
- 15.6G 18.1G
- 17.9G 19.0G
- 18.7G 19.7G 19.5G
this happens even when you run it via Processes? @zhushixia
this happens even when you run it via Processes? @zhushixia
The process has not happened, but it occupies too much memory and an appeal leak occurs in the thread
@longcw hi, master, https://github.com/livekit/agents/issues/1186#issuecomment-2836081048, Maybe this will help you locate the problem, I am from ubuntu too
Experiencing memory leak as well using processes. Happens pretty slowly but happens to all of my agent containers.
Same problem here, seeing similar graph even in my agent too. Running on Debian 12 host, inside a docker container.
Pipecat framework seems to support some force_gc=True when disconnecting participants, to avoid similar situations (I guess). Could we track what gets cleaned up and what doesn't with GC?
I used tracemalloc to troubleshoot the issue. I took a snapshot before a room started up, and generated another snapshot when closing the connection, to get the code lines that consumed the most memory. I'm not sure if this log can help with locating the problem.
D:\workspace\codeup\livekit-agent.venv\Lib\site-packages\livekit\rtc_ffi_client.py:123: size=7076 KiB (+6882 KiB), count=129122 (+125839), average=56 B
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1520.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py:853: size=6657 KiB (+6394 KiB), count=65549 (+62955), average=104 B
D:\workspace\codeup\livekit-agent.venv\Lib\site-packages\livekit\rtc_ffi_client.py:151: size=6654 KiB (+6391 KiB), count=131026 (+125850), average=52 B
D:\workspace\codeup\livekit-agent.venv\Lib\site-packages\livekit\rtc\audio_frame.py:62: size=7428 KiB (+5798 KiB), count=25105 (+19377), average=303 B
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1520.0_x64__qbz5n2kfra8p0\Lib\asyncio\events.py:38: size=4095 KiB (+3933 KiB), count=65520 (+62930), average=64 B
D:\workspace\codeup\livekit-agent\livekit-plugins\livekit-plugins-silero\livekit\plugins\silero\vad.py:304: size=5663 KiB (+2827 KiB), count=4 (+2), average=1416 KiB
D:\workspace\codeup\livekit-agent.venv\Lib\site-packages\livekit\rtc\audio_frame.py:94: size=1587 KiB (+1252 KiB), count=23335 (+18453), average=70 B
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1520.0_x64__qbz5n2kfra8p0\Lib\asyncio\sslproto.py:278: size=1280 KiB (+1024 KiB), count=10 (+8), average=128 KiB
D:\workspace\codeup\livekit-agent.venv\Lib\site-packages\livekit\rtc\audio_frame.py:67: size=751 KiB (+586 KiB), count=12023 (+9380), average=64 B
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1520.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py:856: size=529 KiB (+508 KiB), count=1025 (+985), average=528 B
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1520.0_x64__qbz5n2kfra8p0\Lib\asyncio\proactor_events.py:191: size=448 KiB (+192 KiB), count=14 (+6), average=32.0 KiB
Thank you guys for your patience. Please try the following alternatives and if the issue persists please send me an email and I will take further teps from there.
Do you mean LiveKit (the realâtime audio/video SFU)? Iâll assume that â if you meant something else say so and Iâll adjust. Below is a focused list of common LiveKit memoryâleak causes and a compact, practical stepâbyâstep troubleshooting + fix plan for client (web, Android, iOS) and server (Go) setups.Highâlevel causes specific to LiveKitâ˘Not closing/disposing PeerConnections / Room / Engine objects (clients).â˘Not stopping MediaStreamTracks or local capture (getUserMedia streams left running).â˘Not detaching/removing renderers (video elements, SurfaceViews, VideoRenderers).â˘Event listeners / callbacks left registered on Room, Participants, Tracks.â˘Unpublished/unsubscribed tracks still referenced by app code or caches.â˘Timers/intervals (stats polling, keepalives) not cleared.â˘For Android/iOS: SDK objects (renderers, sinks, Track objects) not released (native resources remain).â˘For Go server: goroutine leaks (e.g., per-connection loops not exiting), maps/slices that grow unbounded (participant/room caches), not closing network connections, not flushing/closing track buffers.â˘WebRTC-specific: RTCPeerConnection, Transceivers, and MediaStreamTracks retained.â˘Thirdâparty components holding references (UI frameworks, logging, analytics).Stepâbyâstep troubleshooting workflow (applies to any platform)1.Reproduce deterministicallyâ˘Create a small scenario that exercises connect/disconnect, publish/unpublish, join/leave repeatedly.2.Measure baseline & growthâ˘Record memory before, during, and after several cycles. Look for progressive growth that doesnât drop after GC.3.Pick the right profiler/tooling (see below) and capture snapshots over timeâ˘Take heap snapshots or native memory recordings at intervals (e.g., after each join/leave).4.Compare snapshots to identify types that grow (PeerConnection, MediaStreamTrack, DOM nodes, custom objects)â˘Inspect retained size and reference paths to GC roots.5.Inspect retention paths to find why objects are reachableâ˘Look for event listeners, static caches, global vars, timers, UI components holding refs.6.Fix: remove/unsubscribe/stop/close and free resources at lifecycle boundariesâ˘Add explicit cleanup on leave/dispose/unmount.7.Re-test and confirm memory stabilizes8.Add lifecycle unit tests and monitoring to catch regressionsPlatformâspecific detection & fixesWeb / LiveKit JSâ˘Tools: Chrome DevTools (Memory panel: Heap snapshot, Allocation instrumentation, Timeline); WebRTC internals (about:webrtc in some browsers).â˘Common culprits: RTCPeerConnection, MediaStreamTrack, HTMLVideoElement nodes, Room event listeners.â˘Cleanup checklist on leave/dispose:â˘room.disconnect() (or room.off + close underlying connections).â˘Stop local capture: for each localTrack call track.stop() (or MediaStreamTrack.stop()).â˘Detach & remove video elements: for each track call track.detach() or video.srcObject = null and remove element.â˘Unsubscribe/unpublish if needed: localParticipant.unpublishTrack(track).â˘Remove event listeners: room.off(...), participant.off(...).â˘Clear any polling timers (stats intervals).â˘Debug steps (example):â˘Reproduce join/leave N times.â˘Take heap snapshots after each cycle and compare.â˘Search for WebRTC objects, PeerConnection, MediaStreamTrack, HTMLVideoElement retained.Android (LiveKit Android SDK)â˘Tools: Android Studio Profiler (Memory), dump HPROF â Analyze with MAT, LeakCanary for runtime leak detection.â˘Common culprits: SurfaceViewRenderer/TextureView leaks, VideoSink still attached, tracks not released, PeerConnection not closed.â˘Cleanup checklist:â˘room.disconnect() / room.close() (use the SDKâs disconnect method).â˘Stop and release local tracks: localVideoTrack.stopCapture()/stop() and localVideoTrack.release() (check exact SDK methods).â˘Remove video sinks: videoTrack.removeSink(renderer); call renderer.release() / surfaceView.release() as appropriate.â˘Unregister listeners/callbacks from Room/Participants/Tracks.â˘Stop polling timers and background handlers.â˘Debug steps:â˘Use LeakCanary to detect retained Activities or Views.â˘Capture an HPROF and inspect retained objects such as PeerConnection, MediaStreamTrack, Activity.iOS (LiveKit iOS SDK)â˘Tools: Xcode Instruments (Allocations, Leaks), memory graph debugger.â˘Culprits: Video renderers not released, tracks still active, strong reference cycles (closures).â˘Cleanup checklist:â˘room.disconnect() / room.dispose() per SDK.â˘Stop local capture and release tracks: localVideoTrack.stop() / localVideoTrack.release().â˘Remove renderers/sinks and remove from view hierarchy.â˘Remove listeners/observers and invalidate timers.â˘Check closures / delegates for strong reference cycles; use weak references.â˘Debug steps:â˘Run Instruments while join/leave cycles executing; look for allocations that donât drop and for leaked objects.LiveKit Server (Go)â˘Tools: go pprof (heap, goroutine), /debug/pprof endpoints, go tool pprof -http=:6060, pprof in production, vet for goroutine leaks.â˘Common culprits: goroutines blocked on channels, not closing transports or subscriptions, maps/slices retaining participant state, file descriptors not closed.â˘Cleanup checklist:â˘Ensure connection teardown closes all goroutines (use contexts, cancel on disconnect).â˘Close network connections and transports, drain channels where necessary.â˘Remove participants from room maps and free structures.â˘Avoid unbounded caches; add eviction or TTL.â˘Explicitly stop any background workers per room when room closed.â˘Debug steps:â˘Collect goroutine profile before/after connecting/disconnecting; look for goroutines that pile up.â˘Capture heap profile and inspect large allocations by stack traces.Concrete examples (conceptual â check SDK version for exact call names)Web (JS) cleanup pattern:â˘On leaving a room:â˘room.offAllListeners();â˘for each localTrack: localTrack.stop(); localTrack.detach(); localParticipant.unpublishTrack(localTrack);â˘remove video elements from DOM and null references;â˘room.disconnect(); set room = null;â˘clearInterval(statsInterval);Android (pseudocode):â˘onDestroy / onLeave:â˘room.offAllListeners();â˘for each localTrack: localTrack.stopCapture(); localTrack.release();â˘for each remoteVideoRenderer: videoTrack.removeSink(renderer); renderer.release();â˘room.disconnect(); room = null;â˘cancel background handlers/timers.Go server (pprof usage):â˘Enable pprof handler:â˘import _ "net/http/pprof"â˘go func() { log.Fatal(http.ListenAndServe(":6060", nil)) }()â˘Take heap profile:â˘go tool pprof http://localhost:6060/debug/pprof/heapâ˘In pprof UI: top, web, list