Decryption failure caused one-time-key to be discarded
Steps to reproduce
n/a
Outcome
See https://github.com/matrix-org/element-ios-rageshakes/issues/23065; in particular https://rageshakes.element.io/api/listing/2023-02-07/145153-6DJDEHFV/console.log.gz at 15:51:12.009:
2023-02-07 15:51:12.009 Element[22267:7676502] [MXCryptoSDK] WARN receive_sync_changes:receive_to_device_event{sender="@Amandine:matrix.org" event_type="m.room.encrypted" message_id="677c9970-41c9-48c1-afa9-20f8b6f22920"}: matrix_sdk_crypto::olm::account: Failed to create a new Olm session from a pre-key message: InboundCreation(MissingOneTimeKey("curve25519:gw9P5us0F4J3Zs2s0TUfel4zC1qOSgpeK0U3zQuBfxc")) sender_key="curve25519:yCzeJlVPpPrg6prkD4l4UezGxye3aXJE9Rd40wzNNhg" session_keys=SessionKeys { identity_key: "yCzeJlVPpPrg6prkD4l4UezGxye3aXJE9Rd40wzNNhg", base_key: "QjzU4LLyreQNMBTczt4bI1+wRzexKfqdrzYoGxEwf04", one_time_key: "gw9P5us0F4J3Zs2s0TUfel4zC1qOSgpeK0U3zQuBfxc" }
Previously, this device failed to decrypt an event on this Olm session due to InvalidMac (see element-hq/element-web#7479). It looks as though our copy of the one-time-key was discarded during that earlier failed decryption, which means that we now have no hope of decrypting any further messages.
I assert that we should not delete one-time-keys on unsuccessful decryption.
Your phone model
iPhone 11 Pro
Operating system version
iOS 16.2
Application version
1.10.0
Homeserver
No response
Will you send logs?
Yes
/cc @Anderas @poljar: Possibly this is in the rust-sdk layer? Either way I think this is something we can and should improve.
I assert that we should not delete one-time-keys on unsuccessful decryption.
Are you sure that this is happening? We may have a bug, but it certainly isn't the intended way of operation:
https://github.com/matrix-org/vodozemac/blob/fb609ca1e4df5a7a818490ae86ac694119e41e71/src/olm/account/mod.rs#L262-L269
And we have a unit test that gives me some confidence that this is not happening:
https://github.com/matrix-org/vodozemac/blob/fb609ca1e4df5a7a818490ae86ac694119e41e71/src/olm/account/mod.rs#L839-L849
Are you sure that this is happening?
no. I'm not, but the logs certainly gave that impression. In that case, I very much wonder what did happen here to cause a MissingOneTimeKey error: clearly it got further previously with the same one-time-key.
perhaps we decided we needed to generate more one-time-keys for the server, which meant we discarded our private copies of some we thought were expired? ie, https://github.com/vector-im/element-web/issues/3309
perhaps we decided we needed to generate more one-time-keys for the server, which meant we discarded our private copies of some we thought were expired? ie, vector-im/element-web#3309
That could be the case, but we only keep 50 on the server while vodozemac has the ability to store many more.
I'll take a look at those logs next week when I'm maintainer.
Tracked in rust-sdk repo https://github.com/matrix-org/matrix-rust-sdk/issues/1761
@kegsay to write a test
New test planned in complement crypto to test that
It might also be related to some mutli-process access? A first machine handles the pre-key message, decrypt it and then drop the OTK (that is in Account), but the inbound session is not yet persisted in store. The second machine takes other with the account that already drops the OTK but the inbound session not stored?
It might also be related to some mutli-process access?
I think that his is exactly what happened here https://github.com/matrix-org/matrix-rust-sdk/issues/3110.
It sounds like we think this is probably due to something like https://github.com/matrix-org/matrix-rust-sdk/issues/3110, and that the initial diagnosis was incorrect. I think we should close this.