procrastinate
procrastinate copied to clipboard
OperationalError: the connection is closed
Hey @ewjoachim and @medihack an interesting error we got from 2 AM to 10 AM this morning. OperationalError: the connection is closed. Here is the stack trace (below). I think it might be related to #1134. We have been using the pool feature in Django 5.1 for a couple of weeks now (and python manage.py procrastinate worker) but this is the first time this real happended. What happended, as far as I can tell, is that the worker kept reusing a dead connection for some reason. So every X seconds that a task was run, this error would be thrown. But I think it is even more mysterious then that, because some of the tasks succeeded, indictating that one of the worker processes had a good connection and the other had a bad one (we run two processes on the same server). The solution was simply kill those two workers and start two new ones, and that solved the problem.
So, is there any way that when that error is thrown, Procrastinate can "get" a new connection from the pool rather than simply throwing an error to avoid this problem? I was under the impression that Procrastinate manages the pool for the worker, not Django, so this seems doable to me?
OperationalError: the connection is closed
File "django/db/backends/base/base.py", line 298, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "django/db/backends/postgresql/base.py", line 433, in create_cursor
cursor = self.connection.cursor()
File "psycopg/connection.py", line 213, in cursor
self._check_connection_ok()
File "psycopg/_connection_base.py", line 524, in _check_connection_ok
raise e.OperationalError("the connection is closed")
OperationalError: the connection is closed
File "procrastinate/worker.py", line 281, in run_job
task_result = await await_func(*job_args, **job.task_kwargs)
File "server/blueprint.py", line 80, in wrapper
raise error
File "server/blueprint.py", line 53, in wrapper
result = await (
File "concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "line_profiler/line_profiler.py", line 150, in wrapper
result = func(*args, **kwds)
File "server/tasks/event_records.py", line 18, in recurring_event_record_processing
EventStreamService.process_records()
File "server/services/aws/event_stream.py", line 60, in process_records
for event_record in event_records:
File "django/db/models/query.py", line 400, in __iter__
self._fetch_all()
File "django/db/models/query.py", line 1928, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "django/db/models/query.py", line 91, in __iter__
results = compiler.execute_sql(
File "django/db/models/sql/compiler.py", line 1572, in execute_sql
cursor = self.connection.cursor()
File "django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "django/db/backends/base/base.py", line 320, in cursor
return self._cursor()
File "django/db/backends/base/base.py", line 297, in _cursor
with self.wrap_database_errors:
File "django/db/utils.py", line 91, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "django/db/backends/base/base.py", line 298, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "django/db/backends/postgresql/base.py", line 433, in create_cursor
cursor = self.connection.cursor()
File "psycopg/connection.py", line 213, in cursor
self._check_connection_ok()
File "psycopg/_connection_base.py", line 524, in _check_connection_ok
raise e.OperationalError("the connection is closed")
If I understand it correctly, the Procrastinate worker uses its own connection pool independent of the Django connection pool (it just fetches the connection parameters from Django). But I wonder if this error originates from the worker or from within the task and thereby a Django connection (from your traceback: File "django/db/models/query.py", line 400, in __iter__). Are those very long-running tasks?
Very possibly! But now I’m confused. Are you saying the task itself is executed using a Django connection, rather than a procrastinate one? But not all the tasks are long running and this has only happened now over the past couple of weeks so it is odd for sure. What would happen is every minute a short task would run (10 seconds or less) and if used a dead connection effectively. Maybe there is way within Procrastinate to force the connection to release? The stack trace bubbles it from Django to Procrastinate
Very possibly! But now I’m confused. Are you saying the task itself is executed using a Django connection, rather than a procrastinate one?
Yes, I think so. Procrastinate extracts the connection parameters from a Django connection, but then uses its own connection pool for everything that is worker related. However, when you access the Django ORM in a task, it uses the Django connections (or that pool if it is set up in Django 5.1).
As a sync task is run in its own thread, the Django connections won't be automatically closed (there is no request-response cycle like in a Django view where it is automatically closed). So, from my understanding (and when reading some of the many issues on the web), one has to close them manually. That's why we call django.db.close_old_connections() in our tasks before an expected long-running calculation so that when the connections are re-opened when needed again. As stated in #1134 CONN_MAX_AGE may work, too. Procrastinate can't do anything about that as it has no information on when those Django connections are needed and when they are not.
@ewjoachim When Procrastinate fetches the connection parameters from Django it creates a new (Django) connection. I wonder if we should explicitly close it after we got the parameters. I don't think it has anything to do with the reported issue, but it could still be problematic as this connection may also time out.
This might be a solid lead.
I dug a bit deeper, but I was wrong. When the connection parameters are fetched from Django, no connection is opened. Contrary to the name create_connection, it just creates a database wrapper that fetches the database parameters from the Django settings (that's why its named wrapper in Procrastinate 😂).
@paulzakin I still recommend manually closing those Django connections. Please report back if that issue nevertheless occurs.
Huh! Ok. Weird. The pool in Django I thought would do that. It's a new feature though - maybe it is a bit buggy on there end. Would there be a good place within Procrastinate to close those connections you guys think?
Or should I try to intercept them somewhere else
Huh! Ok. Weird. The pool in Django I thought would do that. It's a new feature though - maybe it is a bit buggy on there end. Would there be a good place within Procrastinate to close those connections you guys think?
Or should I try to intercept them somewhere else
I can't say much about the new pool feature, but it looks like the errors we had to deal with (without using the pool setting). Those entirely disappeared when we manually closed the connections at specific points in our task. But I also saw some code somewhere where they caught the error, closed the connection, and retried again.
I also wonder how the Django pool feature is related to CONN_MAX_AGE and CONN_HEATH_CHECKS (see "Connection management". The main problem is that Django database connections are built around the request-response cycle, which is not present inside a background task.
When I find time, I can play a bit with long-running tasks that access the Django ORM (with and without the pool option). Do you use the default pool ("pool": True) or pass in you own pool?
No rush, and thanks a bunch! We use the default pool: True - nothing fancy. We can do the close function at the end of each long running task, which makes sense - but I do feel this is something other Procrastinate users will stumble into for sure. Probably worth having some logic in Procrastinate that when raise e.OperationalError("the connection is closed") by Procrastinate, it closes the Django connection as well? That might be what we end up implementing if it happens a bunch more.
There's so much changing on the way we handle everything with v3 that it will be worth checking if the issue still happens once we start releasing v3 beta.
We also haven’t gotten this error in a long time so I’m happy to close it!
Unfortunately @ewjoachim and @medihack - this reusing of a "dead connection", occured again last night. Same as #1134 I would imagine. And we have been on v3 for a long time now, so I don't think that's it.
Here is what I know about it at this point.
- It is not a problem with the database itself.
- Killing the process and starting a new process solves the problem
- It seems to occur randomly (very randomly) with no pattern I can tell.
I can use django.db.close_old_connections() in the task when the exception occurs and see if it occurs again. But maybe this sort of thing should be in the core of Procrastinate?
Adding django.db.close_old_connections() seemed to have solved the problem :) - I will reopen it occurs again. Thanks guys!
Nope - it did not - django.db.close_old_connections() did not do the job :(
I'm gonna do some more investigating and see if I can replicate this locally. As far as I can tell - Procrastinate loses the DB connection for maybe a couple of seconds, freaks out, and then must be restarted. It's such a rare error (maybe once per month) that it is just not that urgent - but I thought I would report all the same.