channels_redis
channels_redis copied to clipboard
Sending messages is slow when using groups
It seems that groups
slow down sending messages ~100 times:
I've created 3 different echo consumers in order to compare them:
- AsyncJsonWebsocketConsumer with groups
- AsyncConsumer with groups
- AsyncConsumer without groups
I got these timing results:
1.
Elapsed time 0.026857614517211914.
Elapsed time 0.026637554168701172.
Elapsed time 0.03318667411804199.
Elapsed time 0.03062891960144043.
Elapsed time 0.026413679122924805.
Elapsed time 0.028857707977294922.
Elapsed time 0.03323197364807129.
Elapsed time 0.026538610458374023.
Elapsed time 0.036324501037597656.
Elapsed time 0.0322566032409668.
Elapsed time 0.0385587215423584.
Elapsed time 0.02066636085510254.
Elapsed time 0.03473186492919922.
Elapsed time 0.03095531463623047.
Elapsed time 0.022891759872436523.
2.
Elapsed time 0.02672290802001953.
Elapsed time 0.022529125213623047.
Elapsed time 0.042157888412475586.
Elapsed time 0.028131961822509766.
Elapsed time 0.035176753997802734.
Elapsed time 0.03284716606140137.
Elapsed time 0.03023362159729004.
Elapsed time 0.03440999984741211.
Elapsed time 0.029916763305664062.
Elapsed time 0.03461766242980957.
3.
Elapsed time 0.00019168853759765625.
Elapsed time 9.107589721679688e-05.
Elapsed time 0.00020265579223632812.
Elapsed time 0.0004429817199707031.
Elapsed time 0.0002741813659667969.
Elapsed time 0.00019693374633789062.
Elapsed time 0.00026679039001464844.
Elapsed time 0.0003097057342529297.
Elapsed time 0.00017261505126953125.
Elapsed time 0.0003085136413574219.
Elapsed time 0.00048232078552246094.
Elapsed time 0.0002384185791015625.
Here is the consumers code:
from channels.generic.websocket import AsyncJsonWebsocketConsumer, AsyncConsumer
from time import time
from channels.layers import get_channel_layer
from asgiref.sync import async_to_sync, sync_to_async
class EchoConsumer1(AsyncJsonWebsocketConsumer):
async def connect(self):
await self.accept()
await self.channel_layer.group_add("test", self.channel_name)
async def disconnect(self, close_code):
await self.channel_layer.group_discard("test", self.channel_name)
async def receive_json(self, content):
start = time()
await self.channel_layer.group_send(
"test",
{
"type": "chat.message",
"json": content
}
)
print("Elapsed time {}.".format(time()-start))
async def chat_message(self, event):
await self.send_json(event["json"])
class EchoConsumer2(AsyncConsumer):
async def websocket_connect(self, event):
await self.send({
"type": "websocket.accept",
})
await self.channel_layer.group_add("test", self.channel_name)
async def websocket_send(self, event):
await self.send(event)
async def websocket_receive(self, event):
start = time()
channel_layer = get_channel_layer()
await channel_layer.group_send(
"test",
{
"type": "websocket.send",
"text": event["text"],
}
)
print("Elapsed time {}.".format(time()-start))
class EchoConsumer3(AsyncConsumer):
async def websocket_connect(self, event):
await self.send({
"type": "websocket.accept",
})
async def websocket_receive(self, event):
start = time()
await self.send({
"type": "websocket.send",
"text": event["text"],
})
print("Elapsed time {}.".format(time()-start))
I'll try to look into this in the next couple of weeks.
Is Redis even involved in the third consumer?
I'm also experiencing very slow behavior when using groups.
Any update on this? Or do we have a solution what hasn't come up here? (We have this issue, but we don't use the latest versions. I would change to the latest/working version, if I could see that it's worth.)
any update
Many thanks to @andrewgodwin for all your hard work on this great project! Are there any updates regarding this bug?
I am considering whether to use on the long term django channels on our site in production (we were hit by the same problem in testing), but I see that this bug has been open for over a year.
any update
These comments don't add anything. You can see there's no update, otherwise there'd be an update... 🙂
This needs someone experiencing the issue to spend the time to dig into it. What we don't have here is Why it's happening?
Short of that, it'll just have to sit for a while.
I have tried to re-test this if issue was still present (also checking some differences between different versions of channels-redis and channels).
These are my results:
## channels 2.2.0 + channel_redis 2.2.1
consumer, mean, variance, max, min
v1, 0.003962, 4.2704E-7, 0.011806, 0.003338
v2, 0.004009, 2.4160E-7, 0.006387, 0.003317
v3, 0.000079, 1.1771E-9, 0.000275, 0.000055
## channels 2.3.1 + channel_redis 2.3.3
consumer, mean, variance, max, min
v1, 0.000831, 1.8120E-8, 0.002146, 0.000706
v2, 0.000826, 3.7920E-8, 0.002566, 0.000683
v3, 0.000069, 8.9345E-10, 0.000288, 0.000053
## channels 2.4.0 + channel_redis 2.4.2
consumer, mean, variance, max, min
v1, 0.000814, 6.0222E-9, 0.001563, 0.000715
v2, 0.000810, 2.2231E-8, 0.002422, 0.000684
v3, 0.000069, 6.4959E-10, 0.000269, 0.000054
Each test was performed with 200 requests to same echo-consumer. Seems that in v2.2 there is an improvement in performances (5x faster) but with just 1 group performance are quite bad compared to simple send (~12x slower).
Some good work was done, but I belive that is bad to slow down by an order of magnitude with a single group.
However for the sake of completeness I will try a more complex test with more than 1 group to see if there are drawbacks or not.
Hi @sevdog. Thanks. If you can work out where the time is being spent, that should help... 🙂
I had run a test logging times inside send_group
function in 3 different points:
- after connection (line 614)
- after map-channels-to-connection (line ~628)
- at the end
I have calculated elapsed time since function get initially called.
These are times stats:
# connection
0.000024, 4.6381E-11, 0.000082, 0.000020
# map channels
0.000459, 1.2593E-8, 0.001299, 0.000402
# whole function
0.000903, 3.4705E-8, 0.002230, 0.000780
From these data seems that _map_channel_keys_to_connection
takes about the same time that the following for loop.
This was still tested with just 1 group.
I will do some other checks with 2 and 10 groups as soon as I can.
I have to apologize for my previous statement. Seems that what takes about 0.000210
seconds (with a single client connected) is this part:
https://github.com/django/channels_redis/blob/0c27648b60a1918c867b96a5ad3f67ef58ccceda/channels_redis/core.py#L614-L622
I have also performed an other test with 3 different clients which connects to my 3 test consumers, these are times:
## channels 2.2.0 + channel_redis 2.2.1
v1, 0.002098, 1.0687E-7, 0.003900, 0.001585
v2, 0.002735, 1.2245E-6, 0.010026, 0.001693
v3, 0.000074, 3.8980E-10, 0.000122, 0.000052
## channels 2.3.1 + channel_redis 2.3.3
v1, 0.002400, 1.2855E-6, 0.010583, 0.001343
v2, 0.002093, 1.0505E-7, 0.002888, 0.001016
v3, 0.000078, 6.7096E-10, 0.000119, 0.000048
## channels 2.4.0 + channel_redis 2.4.2
v1, 0.001904, 5.5975E-8, 0.003261, 0.001259
v2, 0.001906, 5.8758E-8, 0.003391, 0.001102
v3, 0.000065, 2.4156E-10, 0.000095, 0.000050
I will try to add more automation to my actual tests to have a better poit-of-view fo how the situation evolves with more clients (maybe with a graph or something similar).
Any updates on this Issue?
I have done some other tests with a growing number of client connecting to these test consumers (from 1, 6, 11, 16, 21, 26 and 31) having these results:
I dubt that this ha something goog, since I also need a comparison with channels v1, since this should be tested on a multi-host enviroment. My tests were performed on my machine with docker, but they should be done on a semi-real environmet (server and clients separeted).
Also a better way to collect time-related data should be used (instead of print/log).
It seems, that django channels is not realy in use in production environments.. This problem is a no go.
I do not have tested against previous version (channels v1.x aesgi-redis 1.x), so this is not a full comparison between major versions.
Also, as I stated before, an accurate test requires different hosts for clients and server. This requires more time (and money).
Moreover my tests were only performed against Python 3.6, it should also be tested against other versions because this may have some kind of reletion with asyncio handling.
Is there a way to send directly between consumers in the meantime? It seems like calling
#add people to a group, then
await self.channel_layer.group_send("targetgroup", functionstuff)
is the current problem, but what about:
#manually create a group or know about it beforehand
foreach consumer in group:
await self.channel_layer.send(consumer, functionstuff)
I'll try and test this soon, however managing groups ourselves seems like a pain.
EDIT: reading docs, it says that you could maintain the groups yourself. I'll have a look at just using channel_layer.send() vs channel_layer.group_send() with 1-3 consumers
Running a quick test, and tested:
self.channel_layer.send(self.channel_name)
vs
self.channel_layer.group_send(group that only includes myself)
I got 1ms and 2ms respectively. I also noticed that the v3 test doesn't even use redis? Running v3 gives me the 0.000005~ (5 thousandths of an ms) that the previous testers got.
What does self.send()
do? It seems like its a websocket send, not a redis send?
Is the goal to get inter process communication to be as fast as websocket? That makes sense on some level; but what portion of python sending it to pickling, sending to redis, then receiving from redis and unpickling is the bottleneck?
#223 was related. (Speed up reported switching redis connection to unix sockets.)
Channels appears to create a number of lua scripts that get executed. Is it worth considering "pre-loading" those scripts into redis at start-up and then just call them during processing? I'm wondering whether or not some of these delays are caused by the scripts being loaded and parsed before being executed. (Unfortunately, I don't know of any way of measuring that.)
Hey @KenWhitesell. :wave: I don't think that's the issue. They're cached on first load. https://github.com/django/channels_redis/commit/572aa946b58b8187da542efb375d73bf526030d7 was added to make the most of this.
I just used yappi
to do a more accurate profiling of this call, using this code.
Using 32 clients (which were processes on an other host) I got the following results:
name ncall tsub ttot tavg
RedisChannelLayer.group_send 1 0.003629 0.005724 0.005724
RedisChannelLayer.group_send 1 0.004395 0.006690 0.006690
RedisChannelLayer.group_send 1 0.006053 0.009821 0.009821
RedisChannelLayer.group_send 1 0.004937 0.007409 0.007409
RedisChannelLayer.group_send 1 0.004702 0.008454 0.008454
I then tried to figure out how the internal code of the method could be improved and got interested in this piece of code:
https://github.com/django/channels_redis/blob/243eb7e4ecfc2e922508e50210884d3c9bae5403/channels_redis/core.py#L665-L668
Since these tasks does not have any relation which prevents concurrent execution this can be improved via a divide-et-impera using asyncio.gather
. So I tried this code:
await asyncio.gather(*[
connection.zremrangebyscore(
key, min=0, max=int(time.time()) - int(self.expiry)
) for key in channel_redis_keys
])
Which gives the following time results:
name ncall tsub ttot tavg
RedisChannelLayer.group_send 1 0.002697 0.005851 0.005851
RedisChannelLayer.group_send 1 0.001918 0.003774 0.003774
RedisChannelLayer.group_send 1 0.004098 0.008801 0.008801
RedisChannelLayer.group_send 1 0.002748 0.005214 0.005214
RedisChannelLayer.group_send 1 0.003148 0.006863 0.006863
Within my test this has reduced both overall and internal time.
Hey sevdog, thanks for the update. Once Django 3.2a1 is out the way I will be swinging back to Channels so will take a look at this. 👍
@carltongibson any updates here?
@carltongibson @andrewgodwin anything new? I would like to implement channels for a scalable application (expecting 1000+ users in a group). If we cannot do that in the near future, should I consider implementing my own channel layer?
Are there any alternatives? What service should I use for the layer?
Hi both.
No, this is still pending investigation. I've not had bandwidth as yet. If you'd like to dig in that would be super!
I have a doubt @carltongibson , Here Andrew says that channels should not be used for extensive usage in the first place, and that we need to roll out our own solutions for sending messages between multiple nodes. If this issue is fixed, does that mean that we can use channels for sending messages to hundreds of nodes per group (assuming that group sending would become faster)?
@aryan340 without testing there's no real way to say. I would suspect though that at scale you'd want to be looking into more fully featured and battle tested message brokers. There's a lot in this space that's simply out of scope for us here.
For those following this issue, it might be interesting for you to know there is a new channel layer implementation which is a lightweight layer on top of Redis Pub/Sub.
Redis Pub/Sub is efficient & fast. This new channel layer thinly wraps it so we can borrow its speed/efficiency. If you run a ton of messages through your app's channel layer (particularly if using a lot of group_send()
) then I expect you'll see a big performance boost by switching to this new channel layer (we did - we've been running it in production for 6 months and it's been great).
It's called RedisPubSubChannelLayer
. See channels_redis's REDME for info on how to use it. It's still considered alpha-stage so proceed with caution and try it first in your dev/staging environment. Please file an issue if you find one.
Hope it helps someone!
Disclaimer: I started the RedisPubSubChannelLayer
, so I'm biased toward positive feelings about it. But, it has already had other awesome people contribute fixes and improvements (biggest shoutout to @qeternity). :+1: