Memory leak in Prefect server?
Bug summary
Hello Prefect! I think you can clearly see when I migrated to Prefect 3.
This is the memory of my Prefect server 3 running on an AWS ECS task. The instance has 2 vCPU and 4 GB memory.
Version info
Version: 3.4.9
API version: 0.8.4
Python version: 3.12.8
Git commit: 1001c54f
Built: Thu, Jul 17, 2025 09:48 PM
OS/Arch: darwin/arm64
Server type: server
Pydantic version: 2.11.7
Integrations:
prefect-aws: 0.5.12
prefect-gcp: 0.6.8
prefect-sqlalchemy: 0.5.3
Additional context
I am sorry to say this, but Prefect 3 server is pretty unstable, and I am not sure if I would consider it production-ready. Prefect 2 was a gem in comparison.
Hey @mattiamatrix! The events system we added in Prefect 3 runs in memory by default and could be causing the behavior you're seeing. I recommend looking at our guide for scaling a self-hosted Prefect server. In particular, running a Redis server for event messaging should help a lot. Let me know if you continue seeing raised memory usage after adding Redis.
I would not recommend to enable the redis event messaging in a multi server setup right now. We just tried that and ran into high CPU load issues due to spawning of a lot of temporary server instances inside each server pod. Disabling the redis fixed that behavior again. I think this might be related to https://github.com/PrefectHQ/prefect/issues/18654
@AlexanderBabel Can you elaborate on the temporary server instances you're seeing? That shouldn't be happening, and any additional info you can share will help me track down the source of the issue.
We deployed redis and added the following env variables:
global:
prefect:
env:
- name: PREFECT_SERVER_ALLOW_EPHEMERAL_MODE
value: "False"
- name: PREFECT_MESSAGING_BROKER
value: prefect_redis.messaging
- name: PREFECT_MESSAGING_CACHE
value: prefect_redis.messaging
- name: PREFECT_REDIS_MESSAGING_HOST
value: prefect-redis-master.3p-prefect.svc.cluster.local
- name: PREFECT_REDIS_MESSAGING_PORT
value: "6379"
- name: PREFECT_REDIS_MESSAGING_DB
value: "0"
- name: PREFECT_REDIS_MESSAGING_PASSWORD
valueFrom:
secretKeyRef:
name: prefect-redis-secret
key: redis-password
After analyzing the logs and metrics I saw that we encountered high load on the database that pushed the database into recovery mode.
Finally, the prefect-server pods were having issues getting access again to the redis queue which you can see in the attached logs.
Hi, this looks pretty much like our experience in #18654. Not sure if there were multiple processes of prefect server running in one pod container, but the CPU usage definitely went above 1 core, so I'm guessing there were additional processes or threads spawned (not necessarily prefect-server). From listing the processes in a container, there's only the entrypoint on PID 1 and prefect server run by the entrypoint script on PID 6 or 7. Possibly some threads then.
Hi,
I deployed the new version yesterday and was able to activate Redis messaging again. Our setup uses drastically less RAM now. Thanks to the team for the quick responses and the fix of the issue. Highly appreciated!
Thank you for the suggestion @desertaxle.
I think it would be appreciated if the Prefect team could dig into the actual issue that's causing this memory leak in the Prefect 3 server, as some might not be interested in "scaling self-hosted Prefect" or having to introduce more infrastructure like a Redis instance.
Thank you.
@mattiamatrix I may have fixed the issue in https://github.com/PrefectHQ/prefect/pull/18679 which was released with 3.4.12. Can you try running that version in standalone mode and see if you still see an issue with memory usage? If you're still seeing the issue, I'll investigate further.
@mattiamatrix I may have fixed the issue in #18679 which was released with
3.4.12. Can you try running that version in standalone mode and see if you still see an issue with memory usage? If you're still seeing the issue, I'll investigate further.
👍 I upgraded to 3.4.12 this morning, I'll update you in a couple of days 🤞
Hi @AlexanderBabel, did you redis instance memory is doing well? is it not happening the same problem as this https://github.com/PrefectHQ/prefect/issues/18654#issuecomment-3185809164?
Hi @lucasbelo777,
it looks like the memory leak now moved to the redis instead.
@AlexanderBabel there should be a fix for that in #18642. Which will go out in the next prefect-redis release.
@desertaxle I let it run for a few days, but sadly, I see no difference.
Thanks for reporting back @mattiamatrix! Since others are seeing reasonable memory usage when using Redis for the messaging layer, I'll dig into our in-memory messaging layer and see if I can find any places where we're holding onto messages longer than we should.
@desertaxle Thanks for pointing out the fix. We deployed .14 version yesterday and saw a massive drop in memory usage on our redis instance. Thanks again for pushing out fixes that quickly!
I experience the redis memory leak as well:
I'm experiencing the redis memory leak too. It will continue to grow until the memory maxes out and AKS will kill the pod. The pod will then try to restart, but fail, because it continues to load everything back into memory. Which causes a complete outage.
Any guidance on configuring the redis server so that it is able to manage its own memory better? What exactly could be causing it to hold on?
We also have memory leak running prefect 3.4.14...
We are also observing the memory leak running 3.4.8 without Redis.
It seems to be fixed with 3.4.19
I have been running 3.4.19 for a few days, and nothing has changed. I didn't see anything being mentioned in the release changelog related to this issue, so I didn't really expect any improvement.
To remind you, I opened this issue specifically related to the Prefect server without Redis, because I do not wish to add a Redis server that would increase my AWS costs for little to no reason. Prefect 2 was working perfectly well on this front.
In fact, @desertaxle, this seems to be related to the new "events" system. Is it possible to disable it?
I'm not sure what the experience of other people here is, but the new Event Feed page at <prefect-url>/events is quite limited, as I'm unable to load more than 1 hour's worth of events.
@mattiamatrix Can confirm we still have memory leak :/
Also have memory leak in version 3.4.22
Confirm that nothing has changed with 3.4.22! It's becoming frustrating.
@desertaxle, could the Python version have any impact? I am currently using the image prefecthq/prefect:3.4.22-python3.12.
I think I've narrowed down the issue to some consumers either not being able to keep up with the volume of events or crashing and not consuming messages.
In https://github.com/PrefectHQ/prefect/pull/19136, I put a cap on the size of queues for the in-memory messaging implementation. That should prevent runaway memory growth, but it will result in dropped messages if consumers aren't keeping up. There will be warning logs if messages are dropped, which should help us track down which consumer(s) are causing this issue.
If you see a warning log like Subscription queue is full, dropping message for topic=%r after upgrading to 3.4.24 (not yet released), please post it here to help with troubleshooting.
@desertaxle, I have been running 3.4.24 for a couple of days, and I see no improvements. And I don't see the warning log that you mentioned in your last message. 😮💨
@desertaxle Same here. 3.4.24 sitll causes memory leak
We have set up Redis for the messaging layer and updated to version 3.4.24. Although Redis memory was doing well, the problem very clearly persists in the server memory. We have had this problem since we moved to Prefect 3. It is actually very frustrating.
Question: why applying these settings has no effect on the memory leak?
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false
PREFECT_SERVER_SERVICES_EVENT_LOGGER_ENABLED=false
We have set up Redis for the messaging layer and updated to version 3.4.24. Although Redis memory was doing well, the problem very clearly persists in the server memory. We have had this problem since we moved to Prefect 3. It is actually very frustrating.
Question: why applying these settings has no effect on the memory leak?
PREFECT_SERVER_SERVICES_EVENT_PERSISTER_ENABLED=false PREFECT_SERVER_SERVICES_EVENT_LOGGER_ENABLED=false
@msa980, are you saying that adding Redis did not fix the memory leak? I do not want to add another piece of infrastructure, but I was close to testing that as a last resort.
For the record, memory leak is still present in 3.5.0.
@desertaxle @zzstoatzz, is there anything on your side that could help?
@mattiamatrix exactly, Redis did not solve the problem. Same memory growing consumption pace as before.
Question: why applying these settings has no effect on the memory leak?