element-meta
element-meta copied to clipboard
Users whose servers were unreachable will receive undecryptable messages due to failed OTK claim
- Alice tries to send a message in a room that includes Bob.
- Bob's server is offline; Alice's OTK claim therefore times out. Alice sends the message anyway without sharing the key with Bob.
- Later, Bob comes back on line. He receives the room message but not the keys.
Even if Alice subsequently sends another message using the same session, and tries again to share the session key with Bob, it is likely that she will share the megolm ratchet starting at that second message rather than the first one.
Bob will never be able to decrypt the message.
Tasks, with T-shirt sizes
Spec side:
- [ ] Update MSC4081; we need to add unstable prefixes (S)
Server side:
- [ ] Fix https://github.com/element-hq/synapse/issues/11374 (M)
- [ ] Extend
/keys/upload
impl ande2e_fallback_keys_json
table to record "eager_share" flag (S). Remember to add tosynapse_port_db
. - [ ] Trigger
m.device_list_update
when fallback keys are updated (S) - [ ] Include details of fallback_keys in
m.device_list_update
EDU (L) - [ ] When we receive fallback_keys in
m.device_list_update
EDU, stash them ine2e_fallback_keys_json
(or do we need a separate table?) (L) - [ ] Update
/keys/claim
implementation not to setused
flag oneager_share
keys, in both sqlite and postgres impls (S) - [ ] Update
/keys/claim
implementation to fall back to the local store when the remote server is inoperative.
matrix-sdk-crypto:
- [ ] Keep old fallback keys around for longer (M).
- [ ] Ignore
device_unused_fallback_key_types
in/sync
, and instead rotate keys when the current one is old, or has been used (M). - [ ] Set
eager_share_fallback_keys
flag in/keys/upload
request (S)
Testing:
- Write a complement-crypto test for this scenario (L)
Duplicate of #2153
Actually I think this is clearer than #2153, so closing the other.
https://github.com/matrix-org/matrix-spec-proposals/pull/4081 proposes a way to fix this.
To port some of the possible solution thoughts from #2153:
- Alice's client should maintain a persisted queue of not-yet-set-up-Olm sessions, and retry
- Alice's server could nudge Alice's client (e.g. by push) if it spots that Bob's server has come back, so Alice's client can retry setting up Olm.
- MSC4081 is all very well, but it doesn't provide a full solution - you still have the problem that if Bob caches a stale fallback key for Alice, then the session won't set up, and Bob will need to be nudged by his server once it learns that Alice's devicelist has changed - c.f. https://github.com/matrix-org/matrix-spec-proposals/pull/4081/files#r1451581648
We'll need to:
- Deal with device-list-update bugs such as https://github.com/element-hq/synapse/pull/16875
- Push forward MSC4081, including:
- Grokking @ara4n's feedback above
- Server-side changes
- Client-side changes?
- Complement-crypto test
@pmaier1 To check priority given it's happening in not common use cases
We concluded that this has low priority as we consider the impact as "low" (only subject to very specific cases) and the effort to fix as "high".