centrifuge
centrifuge copied to clipboard
[feature] presence with massive amount of active users
Is your feature request related to a problem? Please describe.
presence
It works well (for channels with reasonably small number of active subscribers though)
but, it has a natual limit when users grow in a channel.
because of the implementation and msg definition from https://github.com/centrifugal/protocol/blob/43664d12bdd9086315ccdb96c742618a4ea6b3b0/client.proto#L245
Describe the solution you'd like
I am considering adding a new command, to scan all users from engine(memory/redis)
, and send it with batch reply.
Like this:
for {
// 1. scan redis hash keys
// 2. build repeated response
// 3. send to client
if loopIsDone {
break
}
}
Describe alternatives you've considered
no
Additional context
sometimes, it could useful for admin
to retieve all users inside one big channel.
Hello @morya ,
You are talking about admin needs, but showing client protocol - I think this should not be part of it, at least at first. For now, a method for Node
could be a good start.
Presence for each channel is now kept in single Redis HASH - so there is nothing to scan actually - it's a single key.
In general I'd like to provide more scalability here, it requires more thinking and analysis than provided above.
Yes, it's a admin API, but, the client protocol has precense
call too.
what I mean with redis scan, hgetall
could block redis for a while potentially.
And, true, redis scan is not the best solution here.
I was thinking a sync method, like sync between etcd nodes, or sync/psync between redis master/slave.
It could be a heave lifting...
clients will always see all users, won't miss a single join/leave status, maybe a filter to see range of users.
BR
Another way could be shard presence to load it chunk by chunk from different keys, sth like pagination. For example, you in channel with 100k subscribers, we could set shard number == 10, and then somehow distribute information over those keys. Not sure about exact algorithm and API for this – just an idea for now.
I'd also like to mention that in Centrifugo PRO we approach the need in massive presence analysis by using ClickHouse analytics - i.e. using a system which can provide access to massive data in near real-time.
for now, we use a hack method by read from redis directly with command hscan xxx
.
but, it's not a quite an elegant nor accurate method, when doing both subscribing join/leave
messages and read from hscan
results.
there is no way to keep eventually consistency
data.
that's why I metioned sync between etcd nodes, or sync/psync between redis master/slave
I think it could be a way, like subscribing psync protocol
data from a follower redis
, and parse them, convert them into join/leave
messages.
but it seems quite complicated
Yes, it's complicated... Possibly, for such scale you need different model, with some approximation. Sth like mentioned approach with ClickHouse, or some other store. Heavily depends on the target use case – since you have not described it I am just trying to give you alternative directions of thinking.
Thanks, really appreciate