element-meta icon indicating copy to clipboard operation
element-meta copied to clipboard

Unable To Decrypt meta issue

Open BillCarsonFr opened this issue 2 years ago • 39 comments

Unable to decrypt Epic Issue

Meta issue relating to all "Unable to Decrypt" problems, a.k.a "Waiting for this message, this may take a while".

[A decryption failure normally manifests when the recipient doesn't receive the keys for a particular encryption session; hence the acronym UISI: "Unknown inbound session ID". Nowadays the acronym UTD (unable to decrypt) is generally preferred.]

Needed information to resolve a reported issue

In order to properly debug an Unable To Decrypt error, we need logs from the receiver of the message (the one seeing the issue) and those from the sender. We can't debug issues without logs from both sides.

In your bug report, please identify which event you can't read. Either include the event ID, or something like "message from Dave at 10:10".

How to send rageshake from Element Web:

  • Click on your avatar in top left part of the screen
  • Select "All Settings"
  • Select "Help & About"
  • Then click on the submit debug logs button

How to send rageshake from Element Android:

  • Tap on the top right button on the home screen
  • Select Report Bug
image

How to send rageshake from element iOS:

  • Open the top left drawer
  • Select the Feedback button at the bottom
image

Causes of Unable to Decrypt errors

🟢: We believe that this will be fixed with Element R 🕛: Affects a feature which is not yet supported in Element R

We categorize the main sources of UISI errors as follow:

Client Issues

Sender Side

  • [x] 🟢 ~~The sender failed to send the keys. There’s no consistent retry logic on network error, so key sending is fragile. https://github.com/element-hq/element-meta/issues/673. Test: TestClientRetriesSendToDevice~~
  • [x] ~~https://github.com/element-hq/element-web/issues/20962~~
  • [x] 🟢 ~~The keys may be sent slowly:~~
    • [x] ~~https://github.com/element-hq/element-web/issues/24680 Test: complement-crypto#34~~
    • [x] ~~https://github.com/element-hq/element-web/issues/24681~~
  • [x] 🕛 ~~Sharing historical room keys on invite is slow and should prioritise recent messages https://github.com/vector-im/element-web/issues/23778~~
  • [x] ~~https://github.com/vector-im/element-web/issues/24138~~
  • [x] ~~https://github.com/vector-im/element-web/issues/23792~~
  • [x] ~~https://github.com/matrix-org/matrix-rust-sdk/issues/3127~~
  • [ ] https://github.com/element-hq/element-ios/issues/7751
  • [x] https://github.com/matrix-org/matrix-rust-sdk/issues/1415
  • [ ] https://github.com/element-hq/element-web/issues/27285
  • [ ] https://github.com/element-hq/element-web/issues/27334
  • [ ] https://github.com/matrix-org/matrix-rust-sdk/issues/3354
  • [ ] https://github.com/element-hq/element-meta/issues/2425
  • [ ] Delayed membership responses in /sync cause UTDs:
    • [x] https://github.com/matrix-org/matrix-rust-sdk/issues/3622
    • [ ] https://github.com/matrix-org/matrix-js-sdk/issues/4291
  • [ ] https://github.com/element-hq/element-android/issues/8873
  • [x] https://github.com/matrix-org/matrix-rust-sdk/issues/3959
  • [ ] https://github.com/element-hq/element-ios/issues/7844

Receiver Side

  • [x] 🟢 ~~https://github.com/vector-im/element-meta/issues/762 Test: complement-crypto#37~~
    • [x] ~~https://github.com/element-hq/element-web/issues/23113~~
  • [x] 🟢 ~~https://github.com/vector-im/element-android/issues/5905 (No test since no specific testcase is known - closed because Rust is sufficiently different)~~
  • [x] ~~https://github.com/vector-im/element-web/issues/24450~~
  • [x] 🟢 ~~https://github.com/vector-im/element-web/issues/24682 Test: complement-crypto#38~~
  • [x] ~~https://github.com/element-hq/element-x-android/issues/2520~~
  • [ ] Broken olm: Olm sessions sometimes get out of sync, resulting in undecryptable messages.
    • [x] 🟢 ~~https://github.com/vector-im/element-ios/issues/7479 (Cannot reproduce, so no test)~~
    • [x] ~https://github.com/vector-im/element-ios/issues/7480~
    • [x] 🟢 ~~https://github.com/vector-im/element-web/issues/25723 Test: complement-crypto#35~~
    • [x] ~~Element X iOS: https://github.com/matrix-org/matrix-rust-sdk/issues/3110~~
    • [ ] https://github.com/element-hq/element-meta/issues/2356
    • [ ] https://github.com/element-hq/element-meta/issues/2310
  • [ ] https://github.com/vector-im/element-web/issues/14174
  • [ ] https://github.com/element-hq/element-meta/issues/2421
  • [ ] https://github.com/matrix-org/matrix-rust-sdk/issues/3427
  • [ ] https://github.com/element-hq/element-web/issues/27577
  • [x] https://github.com/matrix-org/matrix-rust-sdk/issues/3768
  • [ ] https://github.com/element-hq/element-x-android/issues/3471
  • [ ] https://github.com/matrix-org/matrix-rust-sdk/issues/3993
  • [ ] https://github.com/element-hq/element-web/issues/28016
  • [ ] https://github.com/element-hq/element-web/issues/28060
  • [ ] https://github.com/matrix-org/matrix-rust-sdk/issues/4033

Server Issues

  • [x] ~To-device messages can take a long time to get sent over federation~
    • [x] ~https://github.com/matrix-org/synapse/issues/15161~
    • [x] ~https://github.com/element-hq/synapse/issues/16680~
    • [ ] https://github.com/element-hq/synapse/issues/8691
  • [x] ~To-device messages may get lost on the server~
    • [x] ~https://github.com/matrix-org/synapse/issues/9533~
    • [x] ~https://github.com/matrix-org/synapse/issues/15335~
    • [x] ~https://github.com/matrix-org/synapse/issues/16681~
  • [x] ~https://github.com/matrix-org/sliding-sync/pull/390~
  • [x] ~https://github.com/element-hq/synapse/issues/17117~
  • [ ] https://github.com/element-hq/element-meta/issues/2411: E2E Device lists can get out of sync with the devices actually present in a room, causing keys not to be sent. This can happen for various reasons: see the linked issue.
  • [ ] https://github.com/element-hq/element-meta/issues/2155
  • [ ] https://github.com/element-hq/synapse/issues/16940
  • [ ] https://github.com/element-hq/synapse/issues/17050

Key Backups

  • [x] ~https://github.com/matrix-org/matrix-rust-sdk/issues/3197~
  • [x] ~https://github.com/element-hq/element-meta/issues/2338~
  • [ ] https://github.com/element-hq/element-meta/issues/2322
  • [ ] https://github.com/matrix-org/matrix-rust-sdk/issues/3875

Protocol Issues

  • [x] ~A client has been offline for too long and the senders have run out of one time keys. This will be addressed by the proposal to maintain fallback keys.~
  • [ ] https://github.com/matrix-org/matrix-spec/issues/1209
  • [ ] https://github.com/matrix-org/matrix-spec/issues/1123
  • [ ] https://github.com/matrix-org/matrix-spec/issues/1124
  • [ ] https://github.com/matrix-org/matrix-spec/issues/1659
  • [ ] https://github.com/element-hq/element-meta/issues/2154
  • [ ] https://github.com/element-hq/element-meta/issues/2268
  • [ ] https://github.com/element-hq/element-meta/issues/2374

Missing features

  • [ ] https://github.com/vector-im/element-meta/issues/646
  • [ ] If all devices are logged out there will be no end to encrypt to. This will be addressed by the proposal for dehydrated devices. https://github.com/vector-im/element-meta/issues/922

UX

Expected UTD

  • [x] ~Some room history (pre-invite for example), should never be decryptable by you. This kind of history should probably be hidden or displayed differently.~
  • [x] ~https://github.com/element-hq/element-meta/issues/2313: New devices cannot decrypt existing history until they have access to key backup~

User Config

  • [ ] https://github.com/element-hq/element-meta/issues/2450

BillCarsonFr avatar Apr 28 '22 07:04 BillCarsonFr

Interesting related blog post https://blog.neko.dev/posts/unable-to-decrypt-matrix.html

BillCarsonFr avatar May 16 '22 16:05 BillCarsonFr

@BillCarsonFr Wow what a great write-up you found at https://blog.neko.dev/posts/unable-to-decrypt-matrix.html

Please correct me if I'm wrong, but having looked at that, it seems that just about all of those errors could be resolved via wizard/options provided to the user by the client which could try to resend encryption key with help of server or to even create a new one for the channel/room. I don't see any 'technical' issue/blocker other than currently missing functionality client-side to prompt user for permission to, with the help of in-between server, do what's necessary to fix room encryption/decryption.

If there's a 'security' issue with properly re-identifying the correct user, for example to re-provide keys to the other user while they've gone offline so they can pick them up from server when they are online again, can't we have a simple "challenge/response" ie question/answer to "pick-up" the keys? (I'm trying to avoid creation of a "new" room/channel and instead fix one already created but giving errors of unable to decrypt. The sender's device has not sent us the keys for this message.)

It seems the reason certain steps aren't by default taken to automatically fix such issues, is that it could be a security risk in certain situations. But if the "creator" of a room gives needed permission, it seems in a sense trivial to resolve such decryption/key errors etc. Or am I wrong and missed something?

And to the author of: https://blog.neko.dev/posts/unable-to-decrypt-matrix.html Many thanks!

jittygitty avatar Nov 17 '22 23:11 jittygitty

Please correct me if I'm wrong, but having looked at that, it seems that just about all of those errors could be resolved via wizard/options provided to the user by the client which could try to resend encryption key

FTR, there used to be such UI, but it was prone to social attack and annoying in the UI. It's something explored, but it's a bit hard to have a fix all encryption problems button. We are trying to go to the bottom of the root cause for distribution failure by moving to the rust sdk

BillCarsonFr avatar Nov 22 '22 12:11 BillCarsonFr

@BillCarsonFr Ah ok guess I'm relatively new so didn't know of the older UI. Personally, I'd be ok with the "social attack" risks and UI annoyance versus the embarrassment of inviting users to my new chat and getting decryption errors.

My concern was that it may be impossible to fix the root for all cases of decryption issues, especially if some are due to security measures which may need to be over-ridden by user consent in order to fix, which brings us back to that UI annoyance.

But regardless, it's great to hear the underlying sdk is being improved to hopefully eliminate or greatly reduce these issues. I had heard of rust in conduit (I run go-dendrite) but didn't know Kotlin SDK is being all redone in rust, is that right?

Anyway, thanks again to everyone working on this! (I look forward to some beta-testing with the new sdk when its ready for that.)

jittygitty avatar Nov 23 '22 08:11 jittygitty

The link to send debug logs does not appear on Element Web on my server. Did this UI change, or is there some option that an admin needs to enable to get that to show up?

anon8675309 avatar Mar 16 '23 22:03 anon8675309

@anon8675309 your config.json must have the URL to send debug logs to, like the example https://github.com/vector-im/element-web/blob/develop/config.sample.json#L25

t3chguy avatar Mar 17 '23 08:03 t3chguy

Can you confirm that this is still the case? I have the bug_report_endpoint_url entry from the sample and the link to send debug logs does not appear. If it's working as expected for you, I'll set up a new server and open a new ticket with the minimal steps to reproduce. (I searched for a report of this issue and didn't find anything, but I'll do so again before opening a new issue).

anon8675309 avatar Mar 19 '23 19:03 anon8675309

Was able to send feedback from Element desktop... and have this issue with one single user on my home server, both users on the same server. Upgraded the room to version 10 and got the error again after not even 10 messages

hieronymousch avatar May 01 '23 00:05 hieronymousch

Hi there, I have been encountering this problem a lot recently. I've already sent logs from element-web and element-android as the person who received encrypted messages that cannot be decrypted, but I could not yet send logs as somehow sending such problematic messages.

Question: when this situation happens in a matrix room, where a given user ends up only sending messages that cannot be decrypted by other room members, is there any (even intricate) known workaround? The only "workaround" I used so far was to upgrade the room to a newer room version, which solves the issue by creating a new room, but I can't really call this a proper solution… and right now I have this problem on a room that is already using version 10…

Ezwen avatar Jun 01 '23 15:06 Ezwen

@Ezwen I've found that asking sender of messages to run /discardsession usually fixes any messages moving forward. Though it does not solve the messages with the encryption issue.

lousando avatar Jun 02 '23 02:06 lousando

@BillCarsonFr just so that I'm on the same page: is there a lack of sender / receiver logs to review?

zetaomegagon avatar Jul 09 '23 17:07 zetaomegagon

Will be fixed with Element R

For the uninitiated, could you please link to something explaining what this is? It's ungoogleable as R is an element on the periodic table, and exists in matrixes sometimes :)

theelous3 avatar Jul 10 '23 11:07 theelous3

I agree with @theelous3 that adding context on Element R would be helpful. From my quick search it seems it's probably short for Element Rust which is Element using the matrix-rust-sdk. Related links:

Anyway, good job team on getting this rolling :)

RayBB avatar Jul 13 '23 14:07 RayBB

is there any ETA? I'm living in an area with really bad cellphone coverage, we're getting UDT messages as soon as somebody leaves the village (well, the ones which didn't switch back to whatsapp)

yennor avatar Jul 13 '23 14:07 yennor

@yennor Not an ETA, but perhaps a short-term remedy. I've found that installing the newer Element Android v1.6.3 version that's based on the Rust SDK has greatly helped in my rooms.

You'll likely need the "vector-gplay-rustCrypto-arm64-v8a-release.apk" file.

lousando avatar Jul 13 '23 17:07 lousando

@lousando thanks, I'll give it a try and hope for the best. What about the element-desktop version? can't find anything about rust there.

yennor avatar Jul 23 '23 14:07 yennor

@lousando thanks, I'll give it a try and hope for the best. What about the element-desktop version? can't find anything about rust there.

Yeah I'm not entirely sure if the desktop app has been swapped out to use the Rust SDK. I usually only keep up with the Android repo as there is usually where the unencrypted problem begins for myself and my recipients.

lousando avatar Jul 23 '23 15:07 lousando

element web/desktop hasnt been changed to use the rust sdk for encryption yet, its a labs option (not one of the beta's) which has to be enabled in the config.md not just in the labs menu

ninchuka avatar Jul 31 '23 11:07 ninchuka

Every friend group or company that evaluted Element/Matrix that I've been a part of has run into this issue and has eventually given up on Matrix with this app. It's completely unrealistic that a non technical person could use it at the rate which the basic functionality of reading a message breaks and how hard it is currently for users to recover from a broken session in an encrypted group chat with multiple people. You're spending more time on tech support than actually communicating. I can't understate how important it is that this issue finally gets fixed after multiple years.

kwinz avatar Feb 12 '24 17:02 kwinz

Agree with @kwinz's comment above. I also can't stress how critical this UX problem is. I can't count the number of people I've encouraged to try Element/Matrix that was turned off by this baffling problem. There are group chats that work fine for a few weeks/months, but suddenly one (or a few) person(s)'s messages would show up as "Unable to decrypt" for everyone else.

This is an incredibly big problem, and I've heard from many people across the world that this is THE REASON they gave up on Element/Matrix.

penyuan avatar Feb 12 '24 17:02 penyuan

Agree with @kwinz's comment above. I also can't stress how critical this UX problem is. I can't count the number of people I've encouraged to try Element/Matrix that was turned off by this baffling problem. There are group chats that work fine for a few weeks/months, but suddenly one (or a few) person(s)'s messages would show up as "Unable to decrypt" for everyone else.

This is an incredibly big problem, and I've heard from many people across the world that this is THE REASON they gave up on Element/Matrix.

I concur, if I have to explain a friend why their messages behave like this they'll just go back to whatsapp and consider me annoying

fuomag9 avatar Feb 18 '24 16:02 fuomag9

We're actively working on this, despite the lack of activity on this particular issue.

In particular, within the last few months we have a dedicated test suite now to identify these sorts of failure modes, which will ensure that clients using the rust SDK FFI bindings (e.g Element X) or JS SDK with rust crypto (e.g Element Web) work correctly going forwards. We're also working through the causes we've identified and fixing them with regression tests where appropriate.

This is going to take some more time I'm afraid, and we all understand how frustrating it is when things break. In the mean time, if you do happen to be using Element X and/or Element-Web and happen to see a message which is undecryptable, please send a rageshake: I actively review unable to decrypt bug reports when they come in on those particular clients.

kegsay avatar Feb 22 '24 12:02 kegsay

if you do happen to be using Element X and/or Element-Web [...] I actively review unable to decrypt bug reports when they come in on those particular clients.

@kegsay Does this include Element-Desktop?

foresto avatar Feb 22 '24 19:02 foresto

Currently this does not include Element Desktop as rust crypto isn't enabled by default (yet). We expect this will change on a timescale of weeks not months. We've been discussing remaining blockers literally yesterday. I'll edit this issue when it's enabled by default on Element-Desktop.

2024-03-18: Element Desktop still doesn't have rust crypto enabled by default yet, but I will update this issue when it is. 2024-04-11: We are beginning to roll out to Element Desktop incrementally.

kegsay avatar Feb 23 '24 16:02 kegsay

Created a new issue that could cause UTDs https://github.com/element-hq/element-meta/issues/2374

BillCarsonFr avatar Mar 25 '24 10:03 BillCarsonFr

2024-03-18: Element Desktop still doesn't have rust crypto enabled by default yet, but I will update this issue when it is. 2024-04-11: We are beginning to roll out to Element Desktop incrementally.

Element Desktop received Rust crypto for new sessions back in February, for what it's worth. Rollout of migration for existing sessions is tracked at https://github.com/element-hq/element-web/issues/27001.

richvdh avatar Apr 26 '24 18:04 richvdh

Updates on recent progress here:

  • We found and fixed a cause of broken Olm sessions on Element X iOS. You don't have to be using Element X iOS yourself to notice UTDs in this scenario: it causes problems on the sender and recipient side alike.
  • We found and fixed a bug which caused Element X to upload broken key backups.
  • We also found a bug in the Dart SDK which caused clients based on that SDK to upload broken key backups. Thanks to @krille-chan and team for fixing it.
  • We rolled out an update to Element X and Element Web R which identifies events that were sent before you logged in, and shows the resultant UTD with a clearer error message.
  • We found and fixed a bug which could cause Synapse to drop to-device messages (potentially containing message keys) during periods of high load.
  • We found and fixed a bug in the Rust matrix-sdk-crypto (so affecting Element Web R and all mobile Element clients) which could cause it to fail to download key backups (thus preventing access to messages sent before you logged in).
  • We found and fixed a bug in Synapse which could cause it to give incorrect membership state to clients, which would mean senders not sharing keys with all members of the room. (It is known, however, that there are more bugs in this area).

richvdh avatar May 31 '24 17:05 richvdh

Updates on recent progress here:

...

Just want to say a bit thank you to @richvdh for this progress update. ❤️ There is still MUCH that needs to be fixed in terms of bugs and UI/UX when it comes to encryption/decryption weirdness, and so many people I introduce to Element still get bounced off because of these problems. But at least getting these progress updates is helpful!

penyuan avatar Jun 01 '24 10:06 penyuan

Recently I've been giving updates for this on This Week in Matrix. If you fail to decrypt a message please:

  • send a bug report to us and mention:
    • the event ID which failed to decrypt,
    • whether it was 1 or many events which failed to decrypt,
    • if many events, are they all from different people?
  • ask the sender to also send a bug report mentioning the event ID

We often need both sides of the conversation to fix the issue.

It would also be helpful for us if you can opt-in to analytics, as that feeds into our graphs which plot UTDs in aggregate. The general trend of the past few months has thankfully been fewer UTDs across clients that opt-in, but there is more work to be done here.

kegsay avatar Jun 20 '24 06:06 kegsay

I usually got that kind of problem when me or the peer beeing in an area with bad mobile phone connection. With bad I mean really bad. You can get disconnected from the network for several minutes all the time, randomly get connected again for a few seconds, or beeing connected, but almost no data gets through. I haven't been there (rural area of Colombia) since last year and won't for a few months. So So I can't tell if the situation improved with the new clients. But maybe for your test-suite, simulate random tcp package drops (very high percentage) with high RTT (Several seconds, sometimes I measured up to 20 seconds. around 5-8 seconds is normal). And sent a few thousand messages there and back.

yennor avatar Jun 20 '24 07:06 yennor