matrix-rust-sdk icon indicating copy to clipboard operation
matrix-rust-sdk copied to clipboard

feat: Implement `EventCacheStoreLock::lock()` with poison error, and `::lock_unchecked`

Open Hywan opened this issue 1 year ago • 6 comments

A flask of poison

A flask of poison.


Since https://github.com/matrix-org/matrix-rust-sdk/pull/4192, the event cache has a cross-process lock via EventCacheLock. It brings a lock method. Great! However, the caller is not aware if the lock has been acquired from the same holder or from another holder. The holder refers to the process name usually. If the holder of the lock changes, it means the in-memory data might now be invalid! That's absolutely terrifying. Imagine the following scenario:

  • the main app process adds events inside the event cache store
  • the main app process is paused
  • the push notification process is fired, and adds events inside the event cache store
  • the main app process is resumed
  • now what? how this process is able to know its data are invalid and should be reloaded?

Poor main app process 😢. We must help this poor main app process, don't we?

Hence this PR.

It introduces a generation counter. The principle is really simple: every time the lock is acquired from another holder, the generation is incremented by one. The generation does not change if the holder stays the same between lock acquisitions.

So.

  • The first patch do that: a generation value is added inside the stores (e.g. matrix-sdk-sqlite).
  • Next, EventCacheStoreLock::lock is renamed lock_unchecked. Why do we want an unchecked version of the lock? What does it even mean? A lock can be poisoned, but sometimes we just don't care! If we want to add a media in the cache, there is no in-memory invalidation involved. Using lock_unchecked is perfect in this case.
  • After that, we do a bit of Rust magic with BackingStore being automatically implemented for all Arc<T> where T: BackingStore.
  • And now we are ready to implement LockableEventCacheStore::is_poisoned. This “poisoning” API is restricted to the event cache; I didn't want to touch the CrossProcessStoreLock API. Maybe that would be useful for all cross-process store locks later, but for now, only event cache, nah!
  • All pieces are ready. We can implement EventCacheStoreLock::lock().
  • Finally, the test parts.

It's better to review this PR commit-by-commit for your own sanity.


  • Address https://github.com/matrix-org/matrix-rust-sdk/issues/3280

Hywan avatar Nov 18 '24 16:11 Hywan

Codecov Report

:x: Patch coverage is 98.46154% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 85.04%. Comparing base (21bb85a) to head (cd4c12e). :warning: Report is 2622 commits behind head on main.

Files with missing lines Patch % Lines
...rates/matrix-sdk-base/src/event_cache/store/mod.rs 96.29% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4285   +/-   ##
=======================================
  Coverage   85.04%   85.04%           
=======================================
  Files         274      274           
  Lines       30043    30080   +37     
=======================================
+ Hits        25550    25583   +33     
- Misses       4493     4497    +4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 18 '24 16:11 codecov[bot]

I will hold off on reviewing until you let me know the ideas have settled a bit or you want my input.

I also would like to share logic where it makes sense with the crypto store.

andybalaam avatar Nov 19 '24 15:11 andybalaam

I totally agree we should share the logic with the crypto store. However it implies a non-trivial work to move this generation and lock poison mechanism into the CrossProcessStoreLock API. I prefer to go step-by-step, especially because I'm running out of time to finish the event cache persistent storage. This PR is for the event cache. I agree a refactoring with all cross-process store locks will be necessary once this patch will land and once the design has proven to be mature. Moreover, the work on the crypto store isn't finished and I'm not confident I can rewrite and complete it. I think we should have a team discussion about that: to come with a solid design together. Having a bit of “duplication” isn't dramatically bad here: it's a specific solution, and I agree having a generic solution would be an improvement, but I don't see that as a blocker.

Hywan avatar Nov 20 '24 08:11 Hywan

After a team discussion, here is the new plan:

  • create new crate: matrix-sdk-cross-process-lock
  • move and rename CrossProcessStoreLock to CrossProcessLock inside matrix-sdk-cross-process-lock
  • improve the CrossProcessLock to include the generation mechanism introduced in #4285
  • add the lock_unchecked and lock+PoisonError logic inside this crate too, including the lock guards etc.
  • create a trait to represent a “poisonable” lock

Once we have that, the following items can be triggered:

  • matrix-sdk-crypto uses this new trait (+ friendly default to use CrossProcessStoreLock maybe?)
  • either:
    • update the state store to be cross-process aware, then event cache store and crypto store use the state store
    • use a new front type, like Stores, that is cross-process aware, and that gives access to all the stores
  • the event cache can implements EventCacheData that contains all in-memory data + owned access to the store (and reload in-memory data in case of poison)
  • the crypto can implements CryptoData that contains all in-memory data + owned access to the store (and reload in-memory data in case of poison) <- is it doable?
  • same dance for the state store?

Hywan avatar Nov 20 '24 13:11 Hywan

What's the state of this PR, it sounds like it's abandoned?

poljar avatar Jan 17 '25 14:01 poljar

It is not. We need to do this, https://github.com/matrix-org/matrix-rust-sdk/pull/4285#issuecomment-2488655073. If we close this PR, we might lose this project. Maybe we can open an issue to not forget the plan, and link the issue to this (closed) PR?

Hywan avatar Jan 20 '25 12:01 Hywan

I don't think it makes sense to keep large PRs that have bitrot like this for a long time, especially when we know there's much more work to do. Thanks for the initial investigation, and please reopen if/when this is ready to review :)

bnjbvr avatar Aug 14 '25 05:08 bnjbvr