feat: Implement `EventCacheStoreLock::lock()` with poison error, and `::lock_unchecked`
A flask of poison.
Since https://github.com/matrix-org/matrix-rust-sdk/pull/4192, the event cache has a cross-process lock via EventCacheLock. It brings a lock method. Great! However, the caller is not aware if the lock has been acquired from the same holder or from another holder. The holder refers to the process name usually. If the holder of the lock changes, it means the in-memory data might now be invalid! That's absolutely terrifying. Imagine the following scenario:
- the main app process adds events inside the event cache store
- the main app process is paused
- the push notification process is fired, and adds events inside the event cache store
- the main app process is resumed
- now what? how this process is able to know its data are invalid and should be reloaded?
Poor main app process 😢. We must help this poor main app process, don't we?
Hence this PR.
It introduces a generation counter. The principle is really simple: every time the lock is acquired from another holder, the generation is incremented by one. The generation does not change if the holder stays the same between lock acquisitions.
So.
- The first patch do that: a
generationvalue is added inside the stores (e.g.matrix-sdk-sqlite). - Next,
EventCacheStoreLock::lockis renamedlock_unchecked. Why do we want an unchecked version of the lock? What does it even mean? A lock can be poisoned, but sometimes we just don't care! If we want to add a media in the cache, there is no in-memory invalidation involved. Usinglock_uncheckedis perfect in this case. - After that, we do a bit of Rust magic with
BackingStorebeing automatically implemented for allArc<T>whereT: BackingStore. - And now we are ready to implement
LockableEventCacheStore::is_poisoned. This “poisoning” API is restricted to the event cache; I didn't want to touch theCrossProcessStoreLockAPI. Maybe that would be useful for all cross-process store locks later, but for now, only event cache, nah! - All pieces are ready. We can implement
EventCacheStoreLock::lock(). - Finally, the test parts.
It's better to review this PR commit-by-commit for your own sanity.
- Address https://github.com/matrix-org/matrix-rust-sdk/issues/3280
Codecov Report
:x: Patch coverage is 98.46154% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 85.04%. Comparing base (21bb85a) to head (cd4c12e).
:warning: Report is 2622 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...rates/matrix-sdk-base/src/event_cache/store/mod.rs | 96.29% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #4285 +/- ##
=======================================
Coverage 85.04% 85.04%
=======================================
Files 274 274
Lines 30043 30080 +37
=======================================
+ Hits 25550 25583 +33
- Misses 4493 4497 +4
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I will hold off on reviewing until you let me know the ideas have settled a bit or you want my input.
I also would like to share logic where it makes sense with the crypto store.
I totally agree we should share the logic with the crypto store. However it implies a non-trivial work to move this generation and lock poison mechanism into the CrossProcessStoreLock API. I prefer to go step-by-step, especially because I'm running out of time to finish the event cache persistent storage. This PR is for the event cache. I agree a refactoring with all cross-process store locks will be necessary once this patch will land and once the design has proven to be mature. Moreover, the work on the crypto store isn't finished and I'm not confident I can rewrite and complete it. I think we should have a team discussion about that: to come with a solid design together. Having a bit of “duplication” isn't dramatically bad here: it's a specific solution, and I agree having a generic solution would be an improvement, but I don't see that as a blocker.
After a team discussion, here is the new plan:
- create new crate:
matrix-sdk-cross-process-lock - move and rename
CrossProcessStoreLocktoCrossProcessLockinsidematrix-sdk-cross-process-lock - improve the
CrossProcessLockto include the generation mechanism introduced in #4285 - add the
lock_uncheckedandlock+PoisonErrorlogic inside this crate too, including the lock guards etc. - create a trait to represent a “poisonable” lock
Once we have that, the following items can be triggered:
matrix-sdk-cryptouses this new trait (+ friendly default to useCrossProcessStoreLockmaybe?)- either:
- update the state store to be cross-process aware, then event cache store and crypto store use the state store
- use a new front type, like
Stores, that is cross-process aware, and that gives access to all the stores
- the event cache can implements
EventCacheDatathat contains all in-memory data + owned access to the store (and reload in-memory data in case of poison) - the crypto can implements
CryptoDatathat contains all in-memory data + owned access to the store (and reload in-memory data in case of poison) <- is it doable? - same dance for the state store?
What's the state of this PR, it sounds like it's abandoned?
It is not. We need to do this, https://github.com/matrix-org/matrix-rust-sdk/pull/4285#issuecomment-2488655073. If we close this PR, we might lose this project. Maybe we can open an issue to not forget the plan, and link the issue to this (closed) PR?
I don't think it makes sense to keep large PRs that have bitrot like this for a long time, especially when we know there's much more work to do. Thanks for the initial investigation, and please reopen if/when this is ready to review :)