procrastinate icon indicating copy to clipboard operation
procrastinate copied to clipboard

Django OperationalError: the connection is closed

Open nicokant opened this issue 1 year ago • 10 comments
trafficstars

I have a django application using procrastinate with the following dependencies:

python 3.11

psycopg[binary]==3.1.19
django==5.0.6
procrastinate[django]==2.9.1

When I start the worker it works fine, but after some time I start getting this error

Logs:


INFO 2024-07-30 08:42:38,237 worker 21 139709328389952 Job import_dataset[25](id='35') ended with status: Error, to retry, lasted 0.001 s
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 294, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 332, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 852, in cursor
    self._check_connection_ok()
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 485, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
psycopg.OperationalError: the connection is closed

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
 File "/usr/local/lib/python3.11/site-packages/procrastinate/worker.py", line 281, in run_job
    task_result = await await_func(*job_args, **job.task_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/procrastinate/utils.py", line 108, in sync_to_async
    return await sync.sync_to_async(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asgiref/sync.py", line 468, in __call__
    ret = await asyncio.shield(exec_coro)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/asgiref/sync.py", line 522, in thread_handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/viltkamera_admin/wild_cameras/tasks.py", line 237, in import_dataset
    dataset = models.Dataset.objects.get(id=id)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 645, in get
    num = len(clone)
          ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 382, in __len__
    self._fetch_all()
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 1928, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 316, in cursor
    return self._cursor()
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 293, in _cursor
    with self.wrap_database_errors:
  File "/usr/local/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/base.py", line 294, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 332, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 852, in cursor
    self._check_connection_ok()
  File "/usr/local/lib/python3.11/site-packages/psycopg/connection.py", line 485, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
django.db.utils.OperationalError: the connection is closed

I'm running the worker using the following command:

python manage.py procrastinate worker -n import -q import

nicokant avatar Jul 30 '24 07:07 nicokant

I had something similar in a long-running task. After some time, the Django database connection times out, and that error occurs. I worked around this issue by calling django.db.close_old_connections() in code locations I knew when no database requests were expected for a longer time. That closes the connections explicitly, and Django will reopen them when needed. Does this help in your case, too?

medihack avatar Jul 30 '24 08:07 medihack

Thank you for the advice, I'll try using the same approach. I'll also try to change the CONN_MAX_AGE from 0 to None inside the queue to make sure the connection is persistent

nicokant avatar Jul 30 '24 08:07 nicokant

@paulzakin recently asked something on CONN_MAX_AGE, maybe you both have the same issue?

ewjoachim avatar Jul 30 '24 09:07 ewjoachim

@nicokant Is the problem solved by django.db.close_old_connections() or CONN_MAX_AGE? If so, I will add a note to our official FAQ and close the issue.

medihack avatar Aug 19 '24 19:08 medihack

We switched to using the database pool introduced in Django 5.1 - which seems to work well. Worth a shot for anyone else

paulzakin avatar Aug 19 '24 19:08 paulzakin

The release 2.13.0 (until 2.13.2) doesn't work anymore with the new connection pool of Django 5.1. My test suite passes with 2.12.0.

connection = self.pool.getconn()

raises a PoolTimeout.

stephane avatar Aug 21 '24 08:08 stephane

@stephane Since 2.13.0, we run a sync task in its own thread (otherwise, sync tasks would block each other in concurrency mode as all run in the main thread). Where do you initialize the pool? Do you use the Django command ./manage.py procrastinate worker to start the worker? I wonder if we need some hook in the worker to help with this. I am not using the new pool feature yet, but maybe we can work something out together. We are also considering switching sync tasks to separate processes (#1161) for a version 3 release, but every sync task would need its own connection pool (and there are other pitfalls, too).

medihack avatar Aug 21 '24 09:08 medihack

Hello friends! Unfortunately I also stumbled upon this error. :(

I have added

         "OPTIONS": {
            "pool": True,
        },

to my django database connection but I still get this error! It seems that when I start the worker it is fine but after some time (haven't tested how much time) it breaks and gives me error:

Nov 28 09:44:46 isktheea2 python[1180272]: django.db.utils.OperationalError: consuming input failed: server closed the connection unexpectedly
Nov 28 09:44:46 isktheea2 python[1180272]:         This probably means the server terminated abnormally
Nov 28 09:44:46 isktheea2 python[1180272]:         before or while processing the request.

Is there a way to fix / improve this ? If not, is there a way to check if the connection is closed and re-open it ?

TIA

spapas avatar Nov 29 '24 08:11 spapas

Getting this issue currently, connecting Django workers to the DB via pgBouncer.

It seems that, when the task gets running, the first async ORM query can't get a DB cursor.

Package details

procrastinate: 2.15.1
django: 4.2.11
psycopg: 3.2.4
python: 3.12.3

Logs

DEBUG 2025-02-19 17:17:46,869 procrastinate.worker.wait_for_job:172 # Waiting for new jobs on all queues
DEBUG 2025-02-19 17:17:51,874 procrastinate.worker.process_job:189 # Loaded job info, about to start job hub.tasks.import_all[522381](request_id='f33a88bc-80d6-4e0d-a78c-cd25ca339a6b', requested_at='2025-02-19T17:17:50.116898+00:00', external_data_source_id='984f0491-2c9a-474b-8791-e76ba8d40c77')
INFO 2025-02-19 17:17:51,874 procrastinate.worker.run_job:259 # Starting job hub.tasks.import_all[522381](request_id='f33a88bc-80d6-4e0d-a78c-cd25ca339a6b', requested_at='2025-02-19T17:17:50.116898+00:00', external_data_source_id='984f0491-2c9a-474b-8791-e76ba8d40c77')
ERROR 2025-02-19 17:17:51,881 procrastinate.worker.run_job:344 # Job hub.tasks.import_all[522381](request_id='f33a88bc-80d6-4e0d-a78c-cd25ca339a6b', requested_at='2025-02-19T17:17:50.116898+00:00', external_data_source_id='984f0491-2c9a-474b-8791-e76ba8d40c77') ended with status: Error, lasted 0.007 s
Traceback (most recent call last):
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 308, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 330, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/_connection_base.py", line 532, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
psycopg.OperationalError: the connection is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/asgiref/sync.py", line 518, in thread_handler
    raise exc_info[1]
  File "/opt/render/project/src/hub/tasks.py", line 32, in wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/hub/tasks.py", line 251, in import_all
    source = await ExternalDataSource.objects.aget(id=external_data_source_id)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/query.py", line 649, in aget
    return await sync_to_async(self.get)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/asgiref/sync.py", line 468, in __call__
    ret = await asyncio.shield(exec_coro)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/python/Python-3.11.8/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/asgiref/sync.py", line 522, in thread_handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/query.py", line 633, in get
    num = len(clone)
          ^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/query.py", line 380, in __len__
    self._fetch_all()
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/query.py", line 1881, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/polymorphic/query.py", line 58, in _polymorphic_iterator
    o = next(base_iter)
        ^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/query.py", line 91, in __iter__
    results = compiler.execute_sql(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 330, in cursor
    return self._cursor()
           ^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 307, in _cursor
    with self.wrap_database_errors:
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 308, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 330, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/_connection_base.py", line 532, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
django.db.utils.OperationalError: the connection is closed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 308, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 330, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/_connection_base.py", line 532, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
psycopg.OperationalError: the connection is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/procrastinate/worker.py", line 281, in run_job
    task_result = await await_func(*job_args, **job.task_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/hub/tasks.py", line 53, in wrapper
    counts = await ProcrastinateJob.objects.aaggregate(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/asgiref/sync.py", line 468, in __call__
    ret = await asyncio.shield(exec_coro)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/python/Python-3.11.8/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/asgiref/sync.py", line 520, in thread_handler
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/query.py", line 592, in aggregate
    return self.query.chain().get_aggregation(self.db, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/sql/query.py", line 554, in get_aggregation
    result = compiler.execute_sql(SINGLE)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 330, in cursor
    return self._cursor()
           ^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 307, in _cursor
    with self.wrap_database_errors:
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/base/base.py", line 308, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 330, in create_cursor
    cursor = self.connection.cursor()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/connection.py", line 213, in cursor
    self._check_connection_ok()
  File "/opt/render/project/src/.venv/lib/python3.11/site-packages/psycopg/_connection_base.py", line 532, in _check_connection_ok
    raise e.OperationalError("the connection is closed")
django.db.utils.OperationalError: the connection is closed
DEBUG 2025-02-19 17:17:51,952 procrastinate.worker.process_job:236 # Acknowledged job completion hub.tasks.import_all[522381](request_id='f33a88bc-80d6-4e0d-a78c-cd25ca339a6b', requested_at='2025-02-19T17:17:50.116898+00:00', external_data_source_id='984f0491-2c9a-474b-8791-e76ba8d40c77')
DEBUG 2025-02-19 17:17:51,955 procrastinate.worker.wait_for_job:172 # Waiting for new jobs on all queues

janbaykara avatar Feb 19 '25 17:02 janbaykara

I ended up firing up new processes in my tasks to work around this. Not great from an efficiency perspective, but it works for what I'm trying to do.

@app.periodic(cron="12 7 *  *  ?")
@app.task
def the_task(timestamp): 
    os.system("python3 manage.py <management-task>")

ryfow avatar Mar 03 '25 14:03 ryfow

We stumbled onto this recently. Our app is running on AWS and we first only directly connected to the RDS instance with CONN_MAX_AGE=0. And it worked without any obvious issues.

We recently introduced the RDS Proxy in front of our RDS instance, and that had a setting that killed idle connections after 30 minutes. This caused the OperationalError: the connection is closed issue all over the place. A solution that seems to work is setting CONN_MAX_AGE=>0 but then also calling close_old_connections() before the task runs... so it does not try to use the potentially disconnected connection.

What confuses me (and I need to look into the Django source code still)... is why CONN_HEALTH_CHECKS=True does not help and establish a new connection. I guess that only works for the request/response cycle in Django?

In any case... It would be great to have this documented somewhere more explicitly. It was not obvious to me how exactly to approach this by just reading the documentation.

TomyMMX avatar Nov 13 '25 23:11 TomyMMX