centrifuge icon indicating copy to clipboard operation
centrifuge copied to clipboard

[feature] presence with massive amount of active users

Open morya opened this issue 1 year ago • 6 comments

Is your feature request related to a problem? Please describe.

presence

It works well (for channels with reasonably small number of active subscribers though)

but, it has a natual limit when users grow in a channel.

because of the implementation and msg definition from https://github.com/centrifugal/protocol/blob/43664d12bdd9086315ccdb96c742618a4ea6b3b0/client.proto#L245

Describe the solution you'd like

I am considering adding a new command, to scan all users from engine(memory/redis), and send it with batch reply. Like this:

for {
  // 1. scan redis hash keys
  // 2. build repeated response
  // 3. send to client
  if loopIsDone { 
    break
  }
}

Describe alternatives you've considered

no

Additional context

sometimes, it could useful for admin to retieve all users inside one big channel.

morya avatar Nov 13 '23 11:11 morya

Hello @morya ,

You are talking about admin needs, but showing client protocol - I think this should not be part of it, at least at first. For now, a method for Node could be a good start.

Presence for each channel is now kept in single Redis HASH - so there is nothing to scan actually - it's a single key.

In general I'd like to provide more scalability here, it requires more thinking and analysis than provided above.

FZambia avatar Nov 15 '23 09:11 FZambia

Yes, it's a admin API, but, the client protocol has precense call too.

what I mean with redis scan, hgetall could block redis for a while potentially.

And, true, redis scan is not the best solution here.

I was thinking a sync method, like sync between etcd nodes, or sync/psync between redis master/slave.

It could be a heave lifting...

clients will always see all users, won't miss a single join/leave status, maybe a filter to see range of users.

BR

morya avatar Nov 15 '23 09:11 morya

Another way could be shard presence to load it chunk by chunk from different keys, sth like pagination. For example, you in channel with 100k subscribers, we could set shard number == 10, and then somehow distribute information over those keys. Not sure about exact algorithm and API for this – just an idea for now.

I'd also like to mention that in Centrifugo PRO we approach the need in massive presence analysis by using ClickHouse analytics - i.e. using a system which can provide access to massive data in near real-time.

FZambia avatar Dec 14 '23 06:12 FZambia

for now, we use a hack method by read from redis directly with command hscan xxx.

but, it's not a quite an elegant nor accurate method, when doing both subscribing join/leave messages and read from hscan results.

there is no way to keep eventually consistency data.

that's why I metioned sync between etcd nodes, or sync/psync between redis master/slave

I think it could be a way, like subscribing psync protocol data from a follower redis , and parse them, convert them into join/leave messages.

but it seems quite complicated

morya avatar Dec 14 '23 10:12 morya

Yes, it's complicated... Possibly, for such scale you need different model, with some approximation. Sth like mentioned approach with ClickHouse, or some other store. Heavily depends on the target use case – since you have not described it I am just trying to give you alternative directions of thinking.

FZambia avatar Dec 28 '23 14:12 FZambia

Thanks, really appreciate

morya avatar Dec 29 '23 08:12 morya