procrastinate
procrastinate copied to clipboard
Django memory leak?
Hi,
It appears that Django's ORM is leaking quite a lot into RAM, and I'm not exactly sure why I can't find anything about this problem.
Let's start with the fix. Wrapping all my tasks with this (largely inspired by Django Channels) resolved the issue:
from django.db import close_old_connections, reset_queries
def plug_psycopg_leak(func):
@wraps(func)
def wrapper(*args, **kwargs):
close_old_connections()
reset_queries()
try:
return func(*args, **kwargs)
finally:
close_old_connections()
reset_queries()
return wrapper
This quickly became an issue on my app because it stores big blobs in DB and they basically were never freed from memory.
Note — Those blobs make sense in my app because they're only read/written from Procrastinate so I don't really care how much slower the operation is in comparison to storing that in a S3 bucket, and as a bonus I don't need a S3 bucket
This is how my memray flamegraph --leaks diagnostic looked like before the fix:
Now all the cursors get freed and I don't see a spot of psycopg in my flamegraph anymore.
The weird thing is, it looks like this leak only happens on my production server. When running on my local machine, the leak is nowhere to be seen in my flamegraph. Which I guess is consistent with the fact that no one reported this issue?
Before digging any deeper, I'd like to get the point of view from the Procrastinate team on this. Like, what is the intended behavior in regards to clearing the Django connections? How come Channels needs it and Procrastinate doesn't mention it?
Thanks!
In Django, database connections are typically managed through the request/response cycle of a view. However, in background jobs, there is no request/response cycle, which means these connections won't get closed automatically. I'm uncertain about connection pools in this context. This issue is not unique to Procrastinate. It also affects other distributed task libraries like Celery. Creating a FAQ entry to address this concern is still on my to-do list. I also invoke close_old_connections at specific points in my code to avoid that issue.
It's also important to note that if connections remain open for too long without being used, PostgreSQL may close them, and the next time you attempt to use that closed connection, you'll encounter an OperationalError: the connection is closed. This also highlights the necessity of manually closing old connections when not used anymore.
Definitely got into that issue was well.
Given that there is a specific Django integration in Procrastinate it'd be interesting to have this as part of the default implementation?
Something interesting would be to have before/after job hooks (a bit like what this says doesn't exist yet) which are configured by default to clear the Django connections when using the contrib Django app?
Finding a general solution could be challenging because connections are thread-local, making it difficult to close connections from a different thread. The asynchronous tasks run in the main thread, while the synchronous tasks operate in their own separate threads. And then we also would need to figure out if we are in a Django project. Currently, the worker is completely agnostic regarding that. The worker even manages its own connections (independent of Django). Additionally, this approach may not be effective for long-running background jobs. I need to think about this a bit.
In my mind, you'd simply be modifying procrastinate.contrib.django.apps.create_app in order to inject a middleware that closes the connections.
Obviously I might be missing some important notions, yet my decorator seems to be working; so if those middleware/hooks existed that could fairly simply solve the issue?
In my mind, you'd simply be modifying procrastinate.contrib.django.apps.create_app in order to inject a middleware that closes the connections.
Unless I'm mistaken, this also impact nonworker code (such as views). Isn't it something we should be doing in the Django Connector ?
To get out of the theoretical space, I did a quick implementation of my idea. As far as I'm concerned it is enough to fix the issue, but I'm far from being an expert in all the possible process models Procrastinate and Django can run at.
Unfortunately, I don't think that this will work. As mentioned, sync tasks run in their own thread. Your close_old_connections only closes the connections in the main thread of the worker and not those in the sync tasks where the connections were started. This solution would only work for an async task (and only reasonable with concurrency set to 1).
And just by the way, I am working on a middleware feature that wraps the task processing.
Okay I see your point.
Once the middleware feature is available, it should become easy to implement I guess :)
@medihack Assuming we run only sync tasks with concurrency=1, will this still be leaking connections in Django context?
@jakajancar I can't say much about the memory leak, but according to @Xowap it seems to be related to the unclosed Django database connections. We can't do anything meaningful about that. If Django connections are not used in a Django View, they must be closed manually. Concurrency doesn't matter here. Also, a middleware does not help (that's why I abandoned the idea). I also had the same problem in Celery. So I just close the connection when I know that I don't need it anymore in the short term. And this seems to work well (never had a problem anymore).
What's the value of CONN_MAX_AGE in the settings.py used for your workers?
Have you enabled support for the thread-safe psycopg connection pools added in django 5.1? It's off by default but will handle this behavior better.
Enabling the connection pool and setting CONN_MAX_AGE to None will give you a persistent connection pool that works across threads.
@TkTech I'm actually curious if you have some source on the interaction of CONN_MAX_AGE and django+psycopg3 pools ? Wasn't able to find easily how the 2 features interact :)
Actually: reading the code explains the code :D
https://github.com/django/django/blob/787f3130f751283140fe2be8188eb5299552232d/django/db/backends/postgresql/base.py#L184-L194
(Makes me wonder if you can set CONN_MAX_AGE to None or if you have to either remove it or set it to zero, and I can't check right now)
Actually: reading the code explains the code :D
https://github.com/django/django/blob/787f3130f751283140fe2be8188eb5299552232d/django/db/backends/postgresql/base.py#L184-L194
(Makes me wonder if you can set
CONN_MAX_AGEto None or if you have to either remove it or set it to zero, and I can't check right now)
Looks like in the current version it needs to be 0 or unset and you use the pool options instead
I understand Django has a thread-scoped connection.
My question is whether, if I am running Procrastinate with concurrency=1 and only sync tasks, are those guaranteed by Procrastinate to execute on the same thread?
My question is whether, if I am running Procrastinate with concurrency=1 and only sync tasks, are those guaranteed by Procrastinate to execute on the same thread?
No, since we switched to thread_sensitivity=False all sync tasks run in their own thread (also with concurrency=1).
My CONN_MAX_AGE is currently 60, so I'll try to fiddle with that and the new pool feature as well. However well, my decorator does do the trick so far.