element-desktop icon indicating copy to clipboard operation
element-desktop copied to clipboard

Loading the session fails with "Unable to load session Error decrypting secret access_token: bad MAC"

Open justjanne opened this issue 2 years ago • 7 comments

None of the previous reporters included any reproduction steps. They all reported element refusing to launch with the aforementioned error.

Element fails with the following stacktrace:

2023-02-16T19:42:57.280Z I Got pickle key
2023-02-16T19:42:57.281Z E Unable to load session Error decrypting secret access_token: bad MAC
Error: Error decrypting secret access_token: bad MAC
    at  s (webpack:///node_modules/matrix-js-sdk/src/crypto/aes.ts:95:14)
    at async te (webpack:///node_modules/matrix-react-sdk/src/Lifecycle.ts:455:16)
    at async Object.q (webpack:///node_modules/matrix-react-sdk/src/Lifecycle.ts:149:8)
    at async webpack:///node_modules/matrix-react-sdk/src/components/structures/MatrixChat.tsx:343:16

justjanne avatar Feb 17 '23 12:02 justjanne

Vast majority of reports are from Linux, 3 from Windows. Theory: race condition between app launch and keyring unlocking

t3chguy avatar Feb 20 '23 09:02 t3chguy

Removing from our board as this is not a fire based on our definition of it and, thus, won't be picked up by our processes.

CC @daniellekirkwood / @andybalaam

Johennes avatar Jul 07 '23 11:07 Johennes

Given the number of reports we receive of this, I'm updating the labels

richvdh avatar Jul 26 '24 11:07 richvdh

Contrary to what you might expect given the error message, this is not related to end-to-end encryption.

Specifically, the problem comes from trying to decrypt the matrix access token, which is stored, encrypted, in indexeddb, in matrix-react-sdk.account, as an IEncryptedPayload, which has the format:

export interface IEncryptedPayload {
    /** the initialization vector in base64 */
    iv: string;
    /** the ciphertext in base64 */
    ciphertext: string;
    /** the HMAC in base64 */
    mac: string;
}

When encrypting the access token, we:

  • take an input key (the "pickle key")
  • feed it into an HKDF, with an info of access_token, to generate 512 bits (64 bytes) of key material
  • use the first 256 bits as an AES-CTR key
  • use the second 256 buts as an HMAC-SHA-256 key
  • encrypt the access token using the AES-CTR key, giving ciphertext
  • sign the ciphertext using the HMAC-SHA-256 key, giving mac.

When decrypting, we therefore derive the same pair of keys, and then verify that a signature on ciphertext matches mac. For this error to occur, in other words, implies that a different pickle key is being

The (misnamed) restoreFromLocalStorage method first retrieves the encrypted access token from indexeddb, and then calls PlatformPeg.getPickleKey to fetch the pickle key. On the Electron platform, getPickleKey uses keytar to fetch a password named <userId>|<deviceId> from the element.io service in the system keyring. That password should be 32-byte random array created when the user logged in.

What is particularly strange here is that, somehow, a completely different pickle key is being hallucinated somehow. It's not that there is a total absence of pickle key; that would be more understandable, as some sort of failure to talk to the system keychain.

richvdh avatar Jul 26 '24 12:07 richvdh

I wonder if this could be something like: the user has logged out and logged in again, causing a new picklekey to be created, and new user id/device id to be stored in localstorage.

However, indexeddb is having a bit of a moment, and the access token is not correctly persisted in indexeddb. Hence, on restore, we get the picklekey for the old encrypted access token.

richvdh avatar Jul 26 '24 12:07 richvdh

I added some logging which might help diagnose this, in https://github.com/matrix-org/matrix-react-sdk/pull/12831. Interested to hear from people who observe it in nightlies or 1.11.72 or later.

richvdh avatar Jul 28 '24 22:07 richvdh

We're still seeing this occasionally. What I see in the logs is:

  • Day 1: User is happily using the app
  • Day 2 (maybe?): User starts app, but it seems to get stuck early during loading, just after claiming the session lock
  • Day 3: User starts app, it loads without an active session. (It's unclear why the previous session was lost). They start a guest session, then log in.
  • Day 4: User starts app; it receives the device ID from Day 1.

It seems like the write to localStorage (for the device ID) on Day 3 isn't persisted, though writes to indexeddb (for the access token and rageshake logs) are persisted.

richvdh avatar Jan 23 '25 11:01 richvdh

This is believed to be resolved by switching away from node-keytar to safeStorage which has a better track record of avoiding race conditions which yield data corruption

t3chguy avatar Jul 07 '25 12:07 t3chguy