channels Intermittent database errors in tests after updating to 4.2

In my main project I have two websocket consumers which performs a lot of database-related actions.

With channels==4.1.0 my unittests with django were proceeding smoothly without any problem (I have already removed any references to database_sync_to_async since this comment, when I switched to async ORM methods).

After I updated to channels==4.2.0 it started randomly (see later) throwing database errors after the async tests are executed:

django.db.utils.OperationalError: the connection is closed

I use a multiprocess testing and I get errors only in the process which is executing consumer-related tests only after those are exeuted, due to how tests are picked up by workers and how the interpreter/host handles threads this error may not occurr.

Hunting down into channels' history I found that in https://github.com/django/channels/pull/2101 it was introduced a work-around to prevent database_sync_to_async to close old connections by mocking close_old_connections calls in the communicator.

Shortly after, in https://github.com/django/channels/pull/2090 the consumers handle/disconnect got an explicit call to aclose_old_connections in their workflow.

My doubt is that there is an execution path in which the mock performed in the communicator is not effective against the db-closure in the consumer, but I need to get more evidences ti get what is going on and how to fix it (currently I had 1 day of errors on my development machine and now I no longer get this error locally but only on github.actions).

Nov 26 '24 16:11 sevdog

My doubt is that there is an execution path in which the mock performed in the communicator is not effective against the db-closure in the consumer...

Sounds likely from the description. Are you able to reduce it, even to an intermittent failure?

(pytest? Fixtures there have often shown these kind of errors, but I never dug in to see exactly why, since I don't use it myself.)

Nov 27 '24 08:11 carltongibson

When I get more times to check on this I will try to introduce a bit of debug logs to check what is going on with connections and threads (currently I do not have enough tools).

I am using pytest with pytest-xdist/pytest-django to run tests on my project while I still use django-based test-cases because I prefer them to test django. pytest, in that project, is only used for its reporting capabilities which are better than django.

Apart from this I belive that the db-cleanup should have been implemented more like in django, using signals and not by putting a direct call to close_old_connections. Having two different way to handle this task in these projects seems odd, expecially because channels relies on django. I understand that this would mean to drop support for django 4.2, because async signals were introduced in 5.0, maybe this could be done in a 5.0 release of channels?

Nov 29 '24 08:11 sevdog

I understand that this would mean to drop support for django 4.2, because async signals were introduced in 5.0, maybe this could be done in a 5.0 release of channels?

I think we could drop Django 4.2 from the release of 5.2, and maybe even earlier. //cc @bigfootjon

It would be good to pin down exactly what's happening here. close_old_connections shouldn't really be a problem... — the connection handler should give you a new one when you next need it... 🤔

Nov 29 '24 09:11 carltongibson

Hi all,

Same issue here. I have a middleware for authentication and an async websocket consumer with database-related message handler to send data over the wire to a client.

However in the call to the database I get a database connection closed error.

I've also removed the database_sync_to_async decorator calls from my code to prevent closed connections but the issue is something other than this.

Feb 20 '25 12:02 francipvb

@francipvb make sure you read this section of the docs: https://channels.readthedocs.io/en/latest/topics/databases.html#database-connections

Particularly the part about calling close_old_connections periodically

Feb 23 '25 20:02 bigfootjon

Hello,

@francipvb make sure you read this section of the docs: https://channels.readthedocs.io/en/latest/topics/databases.html#database-connections

Particularly the part about calling close_old_connections periodically

I've read this documentation section and the consumers code directly without any success. The code just fails during tests without any good reason.

The database related code fails even inside a websocket middleware.

Feb 28 '25 13:02 francipvb

Do you have a sample project that reproduces the issue? Or sample code at least? It's hard to debug without the code and reproduction steps

Mar 01 '25 23:03 bigfootjon

Hello,

I will try to give you a sample project for this case.

Anyway I solved all connection related errors by writing a TransactionTestCase insthead of a TestCase.

Mar 03 '25 10:03 francipvb

Here is a sample project which I used for an other issue with channels and django TestCase.

I have setup a workflow which tests against channels 4.1.0 and 4.2.0, python 3.10-12 and using postgres or sqlite. Here are the results: https://github.com/sevdog/test-websocker-user-lock/actions/runs/14088653132

When I run against postgres I get the error django.db.utils.OperationalError: the connection is closed. With the current setup I always get the error, while before I was getting it in a more randomic way (dunno it is due to pycopg or django version).

Mar 26 '25 16:03 sevdog

Digging deeper, also adding a bit of stupid print statements (because if I try to use a debugger it will just get errors due to timeout) I have found the following hint:

# class/method threading.get_ident()
ApplicationCommunicator.send_input 133367373362880
ApplicationCommunicator.receive_output 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function no_op at 0x77c1fb000860> 133367373362880
[print statements for the connection phase, 2 threads execute code]
ApplicationCommunicator.send_input 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function close_old_connections at 0x77c1fb01d260>> 133367373362880

Even if everithing seems to be kept in the same thread, the first time it is executed the mock is applied while it is not in the second.

Goind deeper I found that the no_op method was the one setup in the receive_output method, because WebsocetCommunicator.connect calls it very shortly after sending data.

I belive that this may be bound to this issue https://github.com/python/cpython/issues/114033 because mock.patch is a sync context manager and thus if used in an async context may not work as intended. An other reason to suspect this is the cause is my previous finding: the patch being applied to close_old_connection is that configured in the receive_output rather than that in send_input whereas the aclose_old_connections is called inside AsyncCommunicator.dispatch which is an input method.

IMO the current unittest.mock is not suitable for this task and the current implementation fails againsta a real database.

Apr 02 '25 14:04 sevdog

Digging deeper, also adding a bit of stupid print statements (because if I try to use a debugger it will just get errors due to timeout) I have found the following hint:
# class/method threading.get_ident()
ApplicationCommunicator.send_input 133367373362880
ApplicationCommunicator.receive_output 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function no_op at 0x77c1fb000860> 133367373362880
[print statements for the connection phase, 2 threads execute code]
ApplicationCommunicator.send_input 133367373362880
AsyncConsumer.dispatch 133367373362880
aclose_old_connections <function close_old_connections at 0x77c1fb01d260>> 133367373362880
Even if everithing seems to be kept in the same thread, the first time it is executed the mock is applied while it is not in the second.

Goind deeper I found that the no_op method was the one setup in the receive_output method, because WebsocetCommunicator.connect calls it very shortly after sending data.

I belive that this may be bound to this issue python/cpython#114033 because mock.patch is a sync context manager and thus if used in an async context may not work as intended. An other reason to suspect this is the cause is my previous finding: the patch being applied to close_old_connection is that configured in the receive_output rather than that in send_input whereas the aclose_old_connections is called inside AsyncCommunicator.dispatch which is an input method.

IMO the current unittest.mock is not suitable for this task and the current implementation fails againsta a real database.

Hello,

Is there a PR to fix this already?

Thanks

Apr 15 '25 17:04 francipvb

https://justinmayer.com/posts/any-updates/

Apr 15 '25 18:04 carltongibson

Here is just an other side of the issue which IMO highlights how the mock implementation is not the right way to handle this.

Mocking close_old_connections in either send_input/receive_output may fail even if the sync context manager works as expected in an async context. This because either methods just awaits on a queue (which acts as a dummy broker) and not on the methods of the consumer. These are two parallel flows and thus there is no guarantee of the execution order:

Communicator.send_input -> Queue.put
await_many_dispatch -> Queue.get -> Consumer.dispatch

Since everything is handled in threads this may work or not according to how the interpreter and the OS handle sheduling. TL;DR: Consumer.dispatch may be called after Communicator.send_input has returned.

Also we need to remember that a consumer may send a message to an other channel where an other consumer may be waiting and during tests this may cause even more problems.

It would be good to have some kind of semaphore which tells to the channels.db package something like "we are running tests, let the connection status to be handled by test runner".

Jun 19 '25 10:06 sevdog