synapse
synapse copied to clipboard
Split out mutable event content from event cache into new caches that are keyed by room ID
In the hopes of fixing https://github.com/matrix-org/synapse/issues/11521 and paving the way towards an immutable external event cache (https://github.com/matrix-org/synapse/issues/2123), a new architecture for the event cache is proposed.
The current state
Currently there are a couple separate data structures related to caching event contents in memory in Synapse (see code for fetching an event from cache/db here):
-
EventsWorkerStore._get_event_cache
- An instance of AsyncLruCache implemented as a map ofEventID
-> (EventBase
, redactedEventBase
| None).- This cache is populated after fetching an event from the database.
- Entries in this cache are invalidated when an event is deleted from the database (in most cases, c.f. #11521), redacted or marked as rejected.
- Entries in this cache are invalidated when the size limit of the LruCache are reached.
-
EventsWorkerStore._event_ref
- A WeakValueDictionary which serves as a single point of reference for EventBase's in memory, ensuring that we don't end up with multiple, unnecessary copies of a single EventBase in memory.- This data structure is populated after fetched an event from the database.
- Because this is a WeakValueDictionary, entries in this cache are invalidated when all other references to the EventBase in an entry are gone.
- Entries in this cache are invalidated when an event is deleted from the database (in most cases, c.f. #11521), redacted or marked as rejected.
- Entries in this cache are not invalidated when an entry is evicted from
EventsWorkerStore._get_event_cache
, as something else may still be processing the event, even if it's been removed from that cache.
What's the problem?
See https://github.com/matrix-org/synapse/issues/11521; because each of these caches are keyed by EventID alone, it becomes tricky to invalidate them when all you have is a RoomID (i.e. when purging a room completely). We could query all known events for a room from the database, but that may result in millions of events. Ideally we'd have some map of RoomID -> EventID which only covers the events that are actually currently held in memory. We could then use that to invalidate all three of these caches.
Additionally, as get_event_cache
contains mutable EventCacheEntry
s (comprised of EventBase, redacted EventBase | None
), invalidating them is necessary when an event is both redacted or marked as rejected. These can differ per-homeserver, so removing this component from the cache entries opens up avenues for multiple homeservers sharing the same, immutable event cache.
Proposal
After speaking with @erikjohnston we've (mostly Erik :) came up with the following idea:
-
EventsWorkerStore._get_event_cache
would simply become a map ofEventID
->EventBase
. - We add a separate cache which is a nested map of
RoomID
->EventID
->{rejected_status: bool, redacted_event_content: Optional[EventBase]}
.- Entries are added to this map when an event is pulled from the database. We know the RoomID at this point.
- Entries are not invalidated from this map when an entry is
EventsWorkerStore._get_event_cache
is invalidated due to hitting the cache size. - This does mean that we'll need to know the RoomID when querying for rejected/redacted status though... But we can get that from the event cache?
The beauty of this is that we no longer need to invalidate the _get_event_cache
at all (unless the size limit is hit)! Even in the room purge use case! How? Here are some examples of using this system:
Fetch EventID A which is not in-memory
- Some calling function asks for EventID A.
- This does not exist in
_get_event_cache
(nor other caches) so we query from the database. The event and related metadata is fetched from the DB (event_json
,redactions
,rejections
) and both the_get_event_cache
and event metadata cache are populated. - Return information from the database.
Fetch EventID A which is in-memory
- Some calling function asks for EventID A.
- This already exists in
_get_event_cache
, and presumably the metadata cache. We take the RoomID from the EventBase in the_get_event_cache
and query the event metadata cache. - Return information from both caches.
EventID A is not in-memory but the event has been purged
- Some calling function asks for EventID A.
- This already exists in
_get_event_cache
, and presumably the metadata cache. We take the RoomID from the EventBase in the cache and query the event metadata cache - but uh oh, there's no matching entry in the metadata cache! The event must have been purged. - We invalidate the entry in the event cache as well and return None.
Thus when purging a room, we only need to purge entries in the metadata cache (which we can easily do by RoomID due to the metadata cache's structure). Entries in the get_event_cache
and event_ref
will be invalidated as they are fetched.
I'm curious for thoughts on whether this sounds reasonable from other members of the Synapse team + cc @Fizzadar.
Questions:
- You refer to the "metadata caches": is this the proposed new cache, or something else?
- Are you proposing to retain
EventsWorkerStore._event_ref
or not? - under your third usecase "We invalidate the entry in the event cache as well": by event cache, you mean
_get_event_cache
? But you've said we never invalidate that? - It would be helpful to spell out the exact semantics of the properties of your new cache, especially
redacted_event_content
. I think you intend for it to be the same as that inEventCacheEntry
, but that's not very well documented either. - Are you assuming that calling functions will also pass in a Room ID? Or are you proposing to look up in
_get_event_cache
(or the db) before looking in your new cache? In any case, walking through the different steps in each usecase more explictly would be helpful. - Your usecases talk about whether the event is in memory or not, but given you now have two caches with different expiry semantics, it's possible for an event to be in one cache and not the other. It might be better to think in terms of the different combinations here.
- Could you work through a couple of other usecases; in particular the things that cause invalidation in the caches (redactions, rejections, purges)?
I've finished a second write up which should hopefully be clearer and answer the above questions.
Proposal
Currently we have both a _get_event_cache
LruCache of type dict of EventID
-> EventCacheEntry
and an _event_ref
WeakValueDictionary of EventID
-> EventBase
. EventCacheEntry
is defined as:
@attr.s(slots=True, auto_attribs=True)
class EventCacheEntry:
event: EventBase
redacted_event: Optional[EventBase]
Today _get_event_cache
tracks both whether the event was redacted (EventCacheEntry.redacted_event
) and whether it was rejected (EventCacheEntry.event.rejected_reason
). I propose to modify the type of _get_event_cache
to be an LruCache of type dict of simply EventID
-> EventBase
. Entries will now only need to be invalidated if the cache is full/entries expire or if the event changes in the database (it is censored, expires or is purged).
Expiration of an event occurs when the undocumented enable_ephemeral_messages
config option is enabled (it's an implementation of MSC2228, self-destructing messages). "Censoring" an event happens after the event has been redacted and the specified redaction_retention_period completes. When an event expires or is censored, it is pruned in the database (we should probably use one term for both of these cases). Redaction and rejected status of an event will no longer be tracked by _get_event_cache
. This will instead be recorded in a separate cache.
Therefore, we define a new cache: _event_metadata_cache
. It is a double-nested dictionary of RoomID
-> EventID
-> EventMetadataCacheEntry
. This dictionary has no inherent size or time-based invalidation strategies (it's not an instance of LruCache).
EventMetadataCacheEntry
looks like:
class EventMetadataCacheEntry:
redacted_event: Optional[EventBase]
rejected_reason: Optional[str]
If redacted_event
is not None
, then the associated event has been redacted. Likewise, if rejected_reason
is not None
, then the associated event was rejected.
Invalidation scenarios
In each invalidation scenario today we have direct access to the Room ID, therefore we can invalidate all three caches when an event changes: _event_ref
, _get_event_cache
and _event_metadata_cache
.
Event redaction
- An event is marked for redaction.
- This happens in
PersistEventsStore._persist_events_txn
where we have theEventBase
and thus access to the Room ID. - We currently invalidate the entry in
_get_event_cache
and_event_ref
as they contain redaction metadata. - After this proposal we "upsert" (invalidate and then re-add) the
_event_metadata_cache
entry for this event, settingEventMetadataCacheEntry.redacted_event
.- Note that we can't just invalidate the entry in the
_event_metadata_cache
, else we'd then have an event in the_get_event_cache
or_event_ref
and not in_event_metadata_cache
. That should only happen in the case of an event purged by a room purge!
- Note that we can't just invalidate the entry in the
Event expiration or censorship
- Event "expiration" only occurs when the undocumented
enable_ephemeral_messages
feature is enabled. The feature prunes the actual event in the database when it expires (this is NOT redaction). The expiration timestamp is listed in the event's content when sent.- This occurs in
MessageHandler._expire_event
where we have theEventBase
, and thus the Room ID.
- This occurs in
- Event "censorship" occurs regularly and is driven by the configured redaction_retention_period config option. When an event has been redacted and the configured time has passed, the event will be "censored", aka pruned in the database.
- This occurs in
CensorEventsStore._censor_redactions
where we have theEventBase
, and thus the Room ID.
- This occurs in
- Since the event body has changed in the database, currently we invalidate the entries in
_event_ref
and_get_event_cache
. - After this proposal, we invalidate the entry in
_event_metadata_cache
as well, even though the event has not been redacted or rejected: it may otherwise never be invalidated and thus take up memory.
Event rejection (and un-rejection)
- An event is marked as rejected, or no longer rejected when it was previously.
- This happens in
StateGroupWorkerStore._update_state_for_partial_state_event_txn
, where we have theEventBase
, and thus the Room ID. - After this proposal we "upsert" (invalidate and then re-add) the
_event_metadata_cache
entry for this event, settingEventMetadataCacheEntry.rejected_reason
.
Persisting a new event
- When we persist a new event, we invalidate the event ID in
_get_event_cache
and_event_ref
. If the event is a redaction, we additionally invalidate the event ID that it redacts.- Note: ~~For the event being persisted, I'm not actually sure why we invalidate the
_get_event_cache
and_event_ref
in this instance.~~ An event may transition from an outlier to "ex-outlier". For example, when receiving an invite member event, that event is an outlier (we don't have the rest of the DAG to verify it). When we join the room, that event transitions to an ex-outlier. In this case, we update the "outlier" bit in the event's internal metadata, then invalidate the relevant event cache entry.
- Note: ~~For the event being persisted, I'm not actually sure why we invalidate the
- This occurs in
PersistEventsStore._persist_events_txn
. - After this proposal, upon a redaction, we no longer need to invalidate the
_get_event_cache
and_event_ref
of the event that was redacted. Instead, we only need to upsert the record in the_event_metadata_cache
. We have access to the room ID of the redacted event to do this.
Partial room purge
- The room is partially purged from the database, meaning a range of event IDs are purged. This is handled by
PurgeEventsStore._purge_history_txn
, and both the room ID and event IDs are known. - We already invalidate the
_get_event_cache
and_event_ref
here, and could do the same with_event_metadata_cache
. As the code notes though, there are race conditions. I'm not convinced that invalidating the cache by room ID helps with that at all.
_get_event_cache
(an LruCache) reaches a size or time limit that triggers invalidation
In this case, entries are intentionally left in _event_ref
, as the events themselves have not changed. They will eventually be invalidated from _event_ref
when no strong references to the event still exist.
However, entries will be left in _event_metadata_cache
, which could balloon. This is currently an unsolved problem. Perhaps when an entry is removed from _event_ref
through a lack of strong references, that could trigger an invalidation in _event_metadata_cache
as well?
Full room purge (new)
- The room is purged from the database.
- The room ID is used to invalidate the entry in the top-level dictionary of
_event_metadata_cache
. - When an event is fetched from the
_get_event_cache
or_event_ref
, we extract the room ID and check whether the event is also listed in the_event_metadata_cache
. If not, we assume that it has been cleared intentionally and returnNone
(rather than treat it as a cache miss, thus assuming it is also not in the database).
Retrieving an event
Let's take a look at the new flow for retrieving an event from the cache/db:
graph TD;
A[Calling Function]-->B[look up event ID in _get_event_cache];
B-->|entry found|C[Check _event_metadata_cache with event.room_id];
B-->|entry not found|D[look up event ID in _event_ref];
C-->|entry found|J[populate caches as necessary and return event];
C-->|entry not found|E[invalidate entry in _get_event_cache and _event_ref];
D-->|entry found|C;
D-->|entry not found|F[look up event in the database];
E-->H[return None];
F-->|entry found|G[Add entries to _get_event_cache, _event_ref, _event_metadata_cache];
F-->|entry not found|I[return None];
G-->K[return event]
in text form:
- When a calling method fetches an event:
- attempt to fetch the event from
- the
_get_event_cache
or_event_ref
cache, if found then- using
event.room_id
, check if an entry for the room ID and event ID exists in_event_metadata_cache
- if so, populate caches as necessary (populate
_get_event_cache
if the event was found in_event_ref
and return the (potentially redacted) cachedEventBase
. - if not, invalidate the entry from
_get_event_cache
and_event_ref
- return
None
.
- if so, populate caches as necessary (populate
- using
- otherwise check the database, if found then
- insert an entry into
_get_event_cache
and_event_ref
- using
event.room_id
, insert an entry into_event_metadata_cache
- return the (potentially redacted)
EventBase
.
- insert an entry into
- otherwise return
None
.
- the
- attempt to fetch the event from
To invalidate an event in _get_event_cache
and _event_ref
correctly, the relevant entry needs to be invalidated in _event_metadata_cache
.
Does this sound like it could work? The benefits are a _get_event_cache
that does not need to be invalidated upon changing the redaction or rejection status of an event. This aides in the case of sharing a cache between homeservers (though all homeservers would need to have configured redaction_retention_period
the same). I'm not hoping to solve that problem here, just help lay the groundwork for it.
The real goal is to allow removing all events in a room from the _get_event_cache
and _event_ref
, without any loss of functionality/theoretical performance.
Note: For the event being persisted, I'm not actually sure why we invalidate the _get_event_cache and _event_ref in this instance. Perhaps "just in case"?
One reason is that an event can go from outlier
to non-outlier
(de-outlier). See just below where we update that outlier
status then invalidate_get_event_cache_after_txn
runs after the txn
.
https://github.com/matrix-org/synapse/blob/2c237debd3476bcc45a76e360b0cb33032b23045/synapse/storage/databases/main/events.py#L451-L455
@MadLittleMods Ah right! I missed that it would be running after the txn, and thus after everything below it had run as well. And thanks for pointing out another invalidation case!
It does beg the question of whether we should also store outlier status in the proposed _event_metadata_cache
, instead of the event cache.
When we transition an event from outlier to ex-outer (PersistEventsStore._update_outliers_txn
), we don't overwrite the contents of the event in the database - we only update the "outlier" bit in its internal metadata (which requires invalidating _get_event_cache
and _event_ref
.
Perhaps we could store outlier status in _event_metadata_cache
as well then?
If the retrieval is transparent then that sounds fine 🤷
@richvdh and I spoke out of band to discuss this. Out of that meeting came a couple simple changes that will resolve https://github.com/matrix-org/synapse/issues/11521: https://github.com/matrix-org/synapse/pull/14161 and https://github.com/matrix-org/synapse/pull/14164. Assuming those merge, that will unblock me for https://github.com/matrix-org/synapse/issues/4720, which was the current motivation for this change.
Those PRs don't allow correctly invalidating the event caches when purging a room though - they just eliminate some of the symptoms. To fix the cause, we still need the ability to invalidate the event caches by room ID.
Some notes on augmenting the above proposal as a result of our chat (full minutes):
- We should invalidate entries in the event cache immediately when purging them from a room. That way things that access the event caches directly won't get confused (ok, well maybe we can just stop doing that. Also making sure we check locally means that we can have an external event cache shared with other homeservers)
- Above we proposed to keep
rejected_reason
as a field onEventBase
, as it is convenient to use to tie together an event and its rejected state when passing the event around. This does mean that an event in theget_event_cache
may have arejected_reason
field that's out of sync withrejected_reason
stored inevent_metadata_cache
. If something reaches into the_get_event_cache
without consultingevent_metadata_cache
, then it may get an incorrect/outdated value forrejected_reason
.- Also note that typically we consider
EventBase
objects immutable. If we update therejected_reason
field on anEventBase
, we'd typically make a newEventBase
and replace the old one. - Also also note that we track rejections in a table called
rejections
, but we've been slowly moving this value directly into theevents
table (via theevents_populate_state_key_rejections
background update), which makes having multiple homeservers share the same events table harder (noted in https://github.com/matrix-org/synapse/issues/11496#issuecomment-1273581202). - It was proposed to not have
rejection_reason
tracked inevent_metadata_cache
after all, and instead just keep it inEventBase
.
- Also note that typically we consider
- Keep redaction status in
event_metadata_cache
. - Note that not all situations require looking up redaction status (sending an event over federation - you just want the requested event and don't care if it's been redacted).
In the end, we may just want to consider having a separate dict of Room ID -> Event ID to fix the problems with invalidating the event caches by room ID. And then a separate amount of work after that to address multi-homeserver caching use cases.
What does "multiple homeservers" mean in this context (and other references like "multi-homeserver", "shared with other homeservers")? How can I share a cache with multiple homeservers? Multiple workers?
If you think of EMS' use case, where you have many homeservers together in a kubernetes cluster, it'd be nice if all of those homeservers could share a single external event store. And on top of that an external event cache. That way you don't duplicate that information across every one of your hundreds of servers.
It should be feasible as long as the data you store is consistent across all homeservers. Event content is, whether the event has been rejection is not necessarily. The latter is currently stored in the event caches (and now the events
table in the database) so moving that to a single store that's shared amongst homeservers is trickier. You also need to think about access controls for events - but if you trust all of the homeservers in your cluster you may be able to get away with just doing that homeserver-side.