element-ios icon indicating copy to clipboard operation
element-ios copied to clipboard

Decryption failure caused one-time-key to be discarded

Open richvdh opened this issue 2 years ago • 10 comments

Steps to reproduce

n/a

Outcome

See https://github.com/matrix-org/element-ios-rageshakes/issues/23065; in particular https://rageshakes.element.io/api/listing/2023-02-07/145153-6DJDEHFV/console.log.gz at 15:51:12.009:

2023-02-07 15:51:12.009 Element[22267:7676502] [MXCryptoSDK]  WARN receive_sync_changes:receive_to_device_event{sender="@Amandine:matrix.org" event_type="m.room.encrypted" message_id="677c9970-41c9-48c1-afa9-20f8b6f22920"}: matrix_sdk_crypto::olm::account: Failed to create a new Olm session from a pre-key message: InboundCreation(MissingOneTimeKey("curve25519:gw9P5us0F4J3Zs2s0TUfel4zC1qOSgpeK0U3zQuBfxc")) sender_key="curve25519:yCzeJlVPpPrg6prkD4l4UezGxye3aXJE9Rd40wzNNhg" session_keys=SessionKeys { identity_key: "yCzeJlVPpPrg6prkD4l4UezGxye3aXJE9Rd40wzNNhg", base_key: "QjzU4LLyreQNMBTczt4bI1+wRzexKfqdrzYoGxEwf04", one_time_key: "gw9P5us0F4J3Zs2s0TUfel4zC1qOSgpeK0U3zQuBfxc" }

Previously, this device failed to decrypt an event on this Olm session due to InvalidMac (see element-hq/element-web#7479). It looks as though our copy of the one-time-key was discarded during that earlier failed decryption, which means that we now have no hope of decrypting any further messages.

I assert that we should not delete one-time-keys on unsuccessful decryption.

Your phone model

iPhone 11 Pro

Operating system version

iOS 16.2

Application version

1.10.0

Homeserver

No response

Will you send logs?

Yes

richvdh avatar Apr 06 '23 15:04 richvdh

/cc @Anderas @poljar: Possibly this is in the rust-sdk layer? Either way I think this is something we can and should improve.

richvdh avatar Apr 06 '23 15:04 richvdh

I assert that we should not delete one-time-keys on unsuccessful decryption.

Are you sure that this is happening? We may have a bug, but it certainly isn't the intended way of operation:

https://github.com/matrix-org/vodozemac/blob/fb609ca1e4df5a7a818490ae86ac694119e41e71/src/olm/account/mod.rs#L262-L269

And we have a unit test that gives me some confidence that this is not happening:

https://github.com/matrix-org/vodozemac/blob/fb609ca1e4df5a7a818490ae86ac694119e41e71/src/olm/account/mod.rs#L839-L849

poljar avatar Apr 06 '23 15:04 poljar

Are you sure that this is happening?

no. I'm not, but the logs certainly gave that impression. In that case, I very much wonder what did happen here to cause a MissingOneTimeKey error: clearly it got further previously with the same one-time-key.

richvdh avatar Apr 06 '23 16:04 richvdh

perhaps we decided we needed to generate more one-time-keys for the server, which meant we discarded our private copies of some we thought were expired? ie, https://github.com/vector-im/element-web/issues/3309

richvdh avatar Apr 06 '23 16:04 richvdh

perhaps we decided we needed to generate more one-time-keys for the server, which meant we discarded our private copies of some we thought were expired? ie, vector-im/element-web#3309

That could be the case, but we only keep 50 on the server while vodozemac has the ability to store many more.

I'll take a look at those logs next week when I'm maintainer.

poljar avatar Apr 06 '23 17:04 poljar

Tracked in rust-sdk repo https://github.com/matrix-org/matrix-rust-sdk/issues/1761

BillCarsonFr avatar Apr 11 '23 07:04 BillCarsonFr

@kegsay to write a test

richvdh avatar Feb 27 '24 14:02 richvdh

New test planned in complement crypto to test that

BillCarsonFr avatar Feb 27 '24 14:02 BillCarsonFr

It might also be related to some mutli-process access? A first machine handles the pre-key message, decrypt it and then drop the OTK (that is in Account), but the inbound session is not yet persisted in store. The second machine takes other with the account that already drops the OTK but the inbound session not stored?

BillCarsonFr avatar Feb 28 '24 09:02 BillCarsonFr

It might also be related to some mutli-process access?

I think that his is exactly what happened here https://github.com/matrix-org/matrix-rust-sdk/issues/3110.

poljar avatar Feb 28 '24 11:02 poljar

It sounds like we think this is probably due to something like https://github.com/matrix-org/matrix-rust-sdk/issues/3110, and that the initial diagnosis was incorrect. I think we should close this.

richvdh avatar Apr 26 '24 18:04 richvdh