[meta] Event Cache should directly be integrated inside the sync response flow
Right now, when a sync response is received, it goes in the sync response flow. It starts at Client::receive_sync_response. At the end, it emits a RoomUpdates on a broadcast channel. On the other side of this channel, EventCache is listening.
Why this decoupled design? If the Event Cache is slow, it's not blocking the sync response, so the whole sync loop (!).
It brings some disadvantages though:
- When an error happens in the Event Cache, it's harder to trace it to the sync response
- It means we can't “retry” a sync iteration, or do something “proper” with the sync data
- When Event Cache has performance issues (it happened recently due to regressions), we don't see it immediately because it runs in parallel, and it's harder to correctly connect them to the real reasons
- It takes more memory, so more energy, than it should:
- Everything that goes in a channel must be cloned, so all events are duplicated
Proposal:
- Event Cache is plugged directly inside the sync response flow
- We can apply the updates on the Event Cache with a parallel iterator per room (hello
rayon) to tackle the performance constraint (if the multiple locks allow that, we need to investigate that anyway) - We can remove the
EventCache::listen_task- One less task!
- Less states in case of channel has lagged or is closed
- We can apply the updates on the Event Cache with a parallel iterator per room (hello
- Remove this channel
Advantages:
- Event cache errors can bubble up to the sync response flow
- We can decide to retry a sync if we detect an error
- It saves memory, so energy
- No magic. Straightforward and trivial flow!
Thanks for opening this issue after our live discussion!
We can remove the EventCache::listen_task
I wonder if we should keep the EventCache enablement optional. As it happens, it's getting used in more and more places, so other components might start to depend on it by default (e.g. search). But enabling it by default would cause some overhead on sync processing too.
We can apply the updates on the Event Cache with a parallel iterator per room (hello rayon) to tackle the performance constraint (if the multiple locks allow that, we need to investigate that anyway)
Note: I've tried that in the past, and it was a slowdown. Didn't have time to investigate why, but I suspected Some™ lock contention.
Note: I've tried that in the past, and it was a slowdown. Didn't have time to investigate why, but I suspected Some™ lock contention.
That's why I was mentioning locks here 😉.
Re. removing the broadcast channel.. It was added before the event cache was a thing, and I think it's public? So probably good to check with downstreams before removing it.
@jplatte Sure. Maybe the mechanism will stay, but the Event Cache won't listen to it anymore.