Users whose servers were unreachable when you logged in will send you undecryptable messages
Migrating from https://github.com/element-hq/synapse/issues/2165:
- Alice logs in on a new device
- Alice's server tries to tell everyone about her new device
- Bob's server is unreachable at that moment
- Later, Bob sends a message. He doesn't know about Alice's new device. Alice sees a UISI.
A solution to this might go something along the lines of:
- We change the send-to-device API to group messages for each target user together.
- Now, when Bob tries to send the key for a megolm session to Alice, he includes a hash of Alice's device list.
- When Alice's server receives that batch of to-device messages, it can tell if the list is outdated, and send an indication back to Bob via Bob's server
- Bob's client updates its copy of Alice's device list and tries again.
This would require Bob's client to keep a journal of which users it tried to send a given megolm key to (which might also be useful for dealing with wedged olm sessions more promptly (https://github.com/element-hq/element-meta/issues/1992). (That might well be be better done after MegolmV2?)
When Bob's server sends the to-device messages to Alice's server, it could also include the stream_id of the latest m.device_list_update that it received for that user, and Alice's server could give an indication as to whether Bob's server is up-to-date or not.
it can tell if the list is outdated, and send an indication back to Bob via Bob's server
There is a race condition here where this check returns false (up-to-date list) and before the event is delivered the client logs in on a new device.
There is a race condition here where this check returns false (up-to-date list) and before the event is delivered the client logs in on a new device.
I'd argue that's less a race, and more that, objectively, Bob logged in after the message was sent, and therefore shouldn't expect to decrypt the event any more than he would if he logged in 3 weeks later. In other words, it's #2313 rather than this issue which is really specific to the parallelism due to federation.