Lost OTK, leading to "OneTime key already exists" error and later UTDs
This is a reprisal of an old issue (https://github.com/matrix-org/matrix-rust-sdk/issues/1415) which we thought we'd fixed, but seems to be back.
Server-side and client-side logs suggest that Element X iOS is creating one-time keys, uploading them to the server, and then forgetting about them.
This essentially guarantees that, at some point down the line, the user is going to receive an undecryptable message.
The manifestation is that both server-side and client side logs are full of errors about "One time key signed_curve25519:AAAAAAAAAUM already exists."
The client retries every few seconds, so it's also a waste of bandwidth on both sides.
It's also problematic that there is no indication in the UI that there is any problem, so the first we know of it is when the user receives a UTD several weeks later.
The linked rageshake from Amandine suggests that the cross-process lock isn't doing what it's supposed to.
I've previously questioned whether the cross-process lock actually works. This looks like more evidence that it does not.
@poljar will make sure this is reported to Sentry so we can see how many people are affected.
@poljar will make sure this is reported to Sentry so we can see how many people are affected.
PR is here: https://github.com/matrix-org/matrix-rust-sdk/pull/5496
PR to report things only once per Client is here https://github.com/matrix-org/matrix-rust-sdk/pull/5588. I forgot about this despite @richvdh warnings that this will be a problem. 🤦
Now that we have some metrics to report on this, it appears that there are very few affected users. Accordingly, we're going to deprioritise it.
(Aside: it appears that there may also be an issue in EW which causes lost OTKs, but we haven't got any rageshakes from affected users and anyway that is likely a separate problem)