element-meta icon indicating copy to clipboard operation
element-meta copied to clipboard

UTDs: maubot/mautrix bridges fail to encrypt for EX sessions

Open kegsay opened this issue 10 months ago • 11 comments

We have had at least three reports of UTDs when Element X is used in conjuction with mautrix bridges by @ jkhsjdhjs:totally.rip, @ frebib:nerdhouse.io and Will L. This is a placeholder issue to collect more information to see if there is something actionable.

Close this issue if:

  • it reaches mid-July and we still know nothing.
  • The reporters report no new UTDs for a several weeks.

In Will L's and jkhsjdhjs's case, it looks like room keys failed to be exchanged correctly, which manifests as the room working fine for a while then suddenly failing to decrypt. For frebib, new rooms are most frequently affected, where the bridge is on a different server to the user seeing the UTD.

WhatsApp bridge is repeatedly the culprit, but that could just be due to its popularity.

kegsay avatar Apr 10 '24 13:04 kegsay

Related https://github.com/element-hq/element-x-ios/issues/2263

kegsay avatar Apr 10 '24 15:04 kegsay

Just had this issue again with my mautrix-whatsapp bridge and noticed that the bridge does indeed encrypt the messages for the Element X session (WNZDTMFIRU). However, Element X is still unable to decrypt the message. I attached the relevant log of mautrix-whatsapp and Element X iOS NotificationServiceExtension to this comment in the hopes, that they are useful.

mautrix-whatsapp.log console-nse.2024-04-20-14.log

jkhsjdhjs avatar Apr 20 '24 20:04 jkhsjdhjs

Failed to decrypt a non-pre-key message with all available sessions errors_by_olm_session=[("Tz1YW/DjvQyQN+PAVmYwx/TSuUco6kBMF2/GeRoIIO4", InvalidMAC(MacError)), ("guGLQ/9DTgDdxVNSGO+b5I6Bipupxyd4i7hOHsSHmGw", InvalidMAC(MacError)), ("euS08VImu5hKNndPrJJZzkiRoGfofi4WjTLFrawY10k", InvalidMAC(MacError)), ("4a5Ake2e7Dh9nR5NZDTxJnO+d3dPcXZ/w/anrduhcZM", InvalidMAC(MacError)), ("Dodt7wc+cqbzdC9Mbt+y6HDshI5tyeyvg07lzMpezQg", InvalidMAC(MacError)), ("w7CTLS0KAVEyDfDiPOSWcqj1JFW8XHoROcf08DGvET8", InvalidMAC(MacError)), ("wxt7uCPcizq23j6CcfRwBXKjyXiH3UYKZjegVPUTNxo", InvalidMAC(MacError)), ("exGrmYijdvRPngQqM4Ebws91giSr0XiqgVFb8NvjqOc", InvalidMAC(MacError)), ("CZYp4cnV5cplf8uL7CZyVWdc55+CRTJxmFUhATyt4pg", InvalidMAC(MacError)), ("aYYnWCd1WhyMvTVpDCDjnHJhgtO5T2QMwWCyY8OWu8s", InvalidMAC(MacError)), ("jViZtkUHSxs8llj1m0OUSVtREIyO+CxeWiTQfk1LV5M", InvalidMAC(MacError))] | crates/matrix-sdk-crypto/src/olm/account.rs:1245 - this feels like it is https://github.com/matrix-org/matrix-rust-sdk/issues/3110 all over again, in which case https://github.com/matrix-org/matrix-rust-sdk/pull/3338 should fix this on EIX.

@wrjlewis which devices were failing to decrypt WhatsApp messages for you?

kegsay avatar Apr 29 '24 16:04 kegsay

I don't understand what it means for an OLM session to "wedge", but I currently work around this issue by removing the outbound sessions for the affected rooms from the mautrix-whatsapp database, i.e.

DELETE FROM crypto_megolm_outbound_session
WHERE room_id IN (
    '!affected_room1:example.com',
    '!affected_room2:example.com',
    ...
)

This forces mautrix-whatsapp to create new sessions for the respective rooms on the next message, which can be decrypted again by EIX. Does this fit the OLM session wedge theory?

Furthermore, going by the wedge theory, shouldn't this issue also occur with messages sent by other E2EE aware parties, like other regular users or other bridges? Shouldn't more users be affected by this?

jkhsjdhjs avatar Apr 30 '24 13:04 jkhsjdhjs

Failed to decrypt a non-pre-key message with all available sessions errors_by_olm_session=[("Tz1YW/DjvQyQN+PAVmYwx/TSuUco6kBMF2/GeRoIIO4", InvalidMAC(MacError)), ("guGLQ/9DTgDdxVNSGO+b5I6Bipupxyd4i7hOHsSHmGw", InvalidMAC(MacError)), ("euS08VImu5hKNndPrJJZzkiRoGfofi4WjTLFrawY10k", InvalidMAC(MacError)), ("4a5Ake2e7Dh9nR5NZDTxJnO+d3dPcXZ/w/anrduhcZM", InvalidMAC(MacError)), ("Dodt7wc+cqbzdC9Mbt+y6HDshI5tyeyvg07lzMpezQg", InvalidMAC(MacError)), ("w7CTLS0KAVEyDfDiPOSWcqj1JFW8XHoROcf08DGvET8", InvalidMAC(MacError)), ("wxt7uCPcizq23j6CcfRwBXKjyXiH3UYKZjegVPUTNxo", InvalidMAC(MacError)), ("exGrmYijdvRPngQqM4Ebws91giSr0XiqgVFb8NvjqOc", InvalidMAC(MacError)), ("CZYp4cnV5cplf8uL7CZyVWdc55+CRTJxmFUhATyt4pg", InvalidMAC(MacError)), ("aYYnWCd1WhyMvTVpDCDjnHJhgtO5T2QMwWCyY8OWu8s", InvalidMAC(MacError)), ("jViZtkUHSxs8llj1m0OUSVtREIyO+CxeWiTQfk1LV5M", InvalidMAC(MacError))] | crates/matrix-sdk-crypto/src/olm/account.rs:1245 - this feels like it is matrix-org/matrix-rust-sdk#3110 all over again, in which case matrix-org/matrix-rust-sdk#3338 should fix this on EIX.

@wrjlewis which devices were failing to decrypt WhatsApp messages for you?

Is it the device IDs you need?

wrjlewis avatar May 01 '24 13:05 wrjlewis

@wrjlewis as a first step could you confirm if it's Element X iOS or anther client that is having the problem?

richvdh avatar May 01 '24 15:05 richvdh

@jkhsjdhjs :

I don't understand what it means for an OLM session to "wedge", but I currently work around this issue by removing the outbound sessions for the affected rooms from the mautrix-whatsapp database, i.e....

This forces mautrix-whatsapp to create new sessions for the respective rooms on the next message, which can be decrypted again by EIX. Does this fit the OLM session wedge theory?

Yes. "Olm", not OLM, by the way: https://gitlab.matrix.org/matrix-org/olm/blob/master/docs/olm.md

Furthermore, going by the wedge theory, shouldn't this issue also occur with messages sent by other E2EE aware parties, like other regular users or other bridges? Shouldn't more users be affected by this?

Well, I think lots of users are affected by this. It's possible that other clients are better at covering it up by using a new olm session than the bridges.

richvdh avatar May 01 '24 15:05 richvdh

@wrjlewis as a first step could you confirm if it's Element X iOS or anther client that is having the problem?

As yes, it's always just on EX iOS for me. I have FluffyChat and Element iOS clients as well which did not present the issue.

wrjlewis avatar May 01 '24 15:05 wrjlewis

This feels like it may already be fixed then. We'll need to wait until https://github.com/matrix-org/matrix-rust-sdk/pull/3338 lands in a proper release which people can test.

kegsay avatar May 02 '24 07:05 kegsay

Should land Monday.

kegsay avatar May 08 '24 09:05 kegsay

This has been rolled out to Element X for a while now. If anyone does see mautrix bridge problems we need bug reports.

Will close this issue in July if there are no mautrix bridge problems.

kegsay avatar Jun 05 '24 14:06 kegsay