centrifugo
centrifugo copied to clipboard
[feature] Integration with native application storage for calling channel history and recovery
Current Centrifugo history model assumes that Centrifugo uses channel history kept in Centrifugo Engine storage. I.e.:
- kept in memory in case of Memory engine
- kept in Redis, KeyDB when using Redis engine
- kept in Tarantool spaces when using Tarantool engine
This provides a way to achieve reliable delivery in a pretty efficient way. But this also assumes that Centrifugo storage is a source of truth, thus requiring applications to retry publish requests until publication is accepted by Centrifugo or using sth like a transactional outbox pattern to publish events to Centrifugo. While this works fine in practice and paid cloud solutions like Pubnub or Ably work this way, not only Centrifugo, it's still a place where we don't have a clear solution yet for application developers.
The idea here is to integrate more tight with the application main database. For example, let's say application uses PostgreSQL as the persistence storage. Let's also suppose that each client subscribed to its own personal channel. Like user:1
, user:2
, and so on.
Application can create a table for channel events: events
table with fields channel
, offset
, publication
. Probably with expire_at
. Publication represents Centrifugo Publication
. In case of PostgreSQL this can be a JSON field, for example. The table is actually very similar to our Engine implementation structures. It can be sharded by a channel in theory (but in case of sharding applications should think about insert atomicity, channels should match with application main sharding key).
Every time application wants to send an update to a user channel it can save an event to events
table.
id | channel | offset | publication | expire_at |
---|---|---|---|---|
1 | user:1 | 1 | {...} | 1658820447 |
2 | user:2 | 1 | {...} | 1658820448 |
3 | user:1 | 2 | {...} | 1658820449 |
The important thing is that offset
should be incremental for every channel. Then it should publish an event to Centrifugo API as usual. But now in case of failure application can just skip the error and proceed since event was saved to events
table already (in transaction) - and Centrifugo already has logic to determine missed publication not coming from the PUB/SUB system. For persistent storages like PostgreSQL having epoch
is not really required, so it can always be an empty string, for example.
The difference is that instead of using native Centrifugo history, users can tell Centrifugo to use history kept in the application storage. So whenever Centrifugo does a history request to Engine now it will actually send request to the application endpoint. In a way similar to our current proxy implementations.
Application should provide an endpoint and respond to history requests according to Centrifugo's history semantics. This will allow automatic state recovery in a channel using the application main storage.
The benefit here is that application events
table may be updated transactionally by the application.
The main downside (outside some additional work on the app side) is performance, where we currently have an efficient implementations of storage based on Redis/KeyDB/Tarantool, we will issue lots of requests to the application if we integrate with the app API for history retrieval. The history is called not only during explicit client requests and resubscribes with recovery, but also during connection lifetime since we need to detect message loss (caused by at most once semantics of publishing). Some things which may reduce an impact:
- larger minimum reconnect/resubscribe time on client side (can be already configured)
- tunable position check intervals (we have them already)
- using
singleflight
feature (currently available in PRO) - we already skipping a position check if we have seen a publication with correct offset recently
None of the techniques above will allow to fully eliminate the load generated by many connections with many channels – but this can be an acceptable tradeoff for some cases I believe.
I don't have the use case for this personally, so opening for discussion and thoughts from the interested parties.
Probably it's sth that may araise at some point in the future - but closing for now, that's sth I'd like to avoid stepping into without proper justification.