element-meta Gather sender client information for UTDs via device keys info

In our analytics data (in Posthog), we want to be able to distinguish UTD events based on the client and version of the sender of the event (because often the problem is on the sender side rather than the recipient). The problem is that currently, we don't know this information.

It is proposed to resolve this as follows:

Include information on client information in the device keys that are uploaded via /keys/upload and then downloaded via /keys/query.
When we receive an undecryptable room message, locate the sending device via the sender_key (or device_id) embedded in the message, so that we can include the client metadata in the Posthog event.

Potential issue: sender_key and device_id are both deprecated. But they are still there today, so probably good enough for our purposes?

Notes:

We will need to update the legal team to ensure that the privacy policy is updated: https://github.com/element-hq/legal-compliance/issues/445.
We will need to re-upload the device keys each time the client is upgraded to ensure the version is maintained. (This of course leads to races where, by the time a message is received, the sender has upgraded to a different client version. Hopefully that won't happen often enough to significantly impact the data.)
Consider whether we should only expose this information for people who have opted into analytics. Or some other control. It does expose the client information to anyone on the internet who cares to ask, which could be seen as something of a privacy leak.

May 02 '24 16:05 richvdh

In terms of protocol changes, I'd like to add an object like the following to the response to /keys/query:

{
    "io.element.client_data": {
        "app_name": "Element",
        "app_platform": "Web Platform",
        "app_version": "1.11.70"
    }
}

These properties are based on those already sent to Posthog.

It's worth noting that we can't simply add this data to the body of the key uploaded with /keys/upload, otherwise a change to app_version will invalidate any cross-signing signatures, and hence require re-verification of the device. Instead, we must add it to the unsigned section, but that isn't currently exposed via /keys/upload. In short, whatever we do, we need server-side and spec changes.

Should the data format be specced?

Instead of using an unspecced io.element.client_data property, we could push a change into the spec. Given this will have no "functional" effect, and is solely for our own analytics, my inclination is not to. Open to discussion here though, particularly since we need spec changes anyway.

Uploading the new data

I'm currently torn between two approaches:

Extend PUT /devices/{deviceId} to accept a new property public_data, which contains arbitrary namespaced data, and can then be returned via /keys/query.
Extend POST /keys/upload to accept an unsigned property, which is then merged with the existing unsigned property for /keys/query.

Advantages of the PUT /devices/{deviceId} approach:

The semantics of the new data are similar to [device_]display_name.
It might be a bit easier to integrate in clients, since the client itself is responsible for hitting PUT /devices/{deviceId}, whereas the crypto stack has to hit POST /keys/upload.
(Not an advantage per se, but it's worth noting that updates via [PUT /devices/{deviceId}] do trigger device-list updates to tell other users to refresh their device list cache, which is important here.)

Advantages of the POST /keys/upload approach:

It ensures that other devices can't erroneously overwrite a given device's client_data. Per REST semantics, PUT /devices/{deviceId} should replace the whole of the device data, so this seems like a significant risk. Otherwise we have to invent PATCH /devices/{deviceId} and specify merge semantics.
device_display_name is only exposed over federation if a config setting is enabled, which it's not by default. It's not entirely clear that we should just bolt on a generic public_data and blindly share everything over federation, when, at least by default, other things in PUT /devices/{deviceId} are not shared in this way.

Jul 10 '24 18:07 richvdh

Potential issue: sender_key and device_id are both deprecated. But they are still there today, so probably good enough for our purposes?

The spec implies that session_id should give us enough info to look up the session, so maybe we could find the sender_key or device_id via the session? I'm just musing on what we can do to avoid making more work for ourselves in future when these properties disappear.

Nov 15 '24 10:11 andybalaam

We are leaning towards the /keys/upload approach, so I will investigate that a bit further.

Nov 15 '24 10:11 andybalaam

Potential issue: sender_key and device_id are both deprecated. But they are still there today, so probably good enough for our purposes?

The spec implies that session_id should give us enough info to look up the session, so maybe we could find the sender_key or device_id via the session?

If we recognised the session, then we could almost certainly decrypt the message :(

Nov 15 '24 11:11 richvdh

Rough breakdown of work that will be needed:

Legal: clarify the chosen path, ask whether it's OK, and perform any required actions e.g. updates to privacy policy
Spec: create MSC that adds unsigned to /keys/upload, including the effect of this on what comes down over /keys/query and how any clashes are resolved.
Spec: respond to required changes, including re-planning the work if plans need to change
Spec: shepherd MSC until merged
Synapse: add unsigned to /keys/upload and include it in /keys/query
PostHog: Add optional properties to the PostHog schema for UTDs, allowing providing the client version info
PostHog: Add a graph to the posthog crypto dashboard: UTDs breakdown by Element X, Element Classic and non-Element
matrix-rust-sdk: allow providing client version info during client setup (note: if client version info is not supplied, delete the stored value: user may have disabled the analytics opt-in, so we must reflect it)
matrix-rust-sdk: ensure that if we receive unexpected keys in /keys/query's unsigned section, we do not crash
matrix-rust-sdk: include client version info in /keys/upload requests, if it was supplied
matrix-rust-sdk: parse client version info from /keys/query requests and store it with a device
- Memory store
- Sqlite store
- IndexedDB store
matrix-rust-sdk: include client version info (where available) in UTD reports to PostHog
- Add optional sender client info to the enums used for UTD reporting
- When we detect a UTD, look up the device using sender_key or device_id or (preferably) session_id and include any client version info we find in the UTD report
matrix-rust-sdk: when client is created, detect whether the version has changed since last time, and if so re-upload to /keys/upload
Clients: If opted in to analytics, provide client version info to matrix-rust-sdk when creating the client
- Web and Desktop (note supplied info will differ between these two)
- Android Classic
- iOS Classic
- Android X
- iOS X

Nov 15 '24 11:11 andybalaam

If we recognised the session, then we could almost certainly decrypt the message :(

Ah! Right, so stick with the deprecated fields until we are forced to do something else.

Nov 15 '24 11:11 andybalaam

We are leaning towards the /keys/upload approach, so I will investigate that a bit further.

We revisited this in a chat, so just to record the points raised:

If we used PUT /devices/{deviceId}, then making sure that the application data was preserved when somebody updated the device display name (possibly from another client, possibly racing with an update from the original client) would be annoyingly fiddly
If/when we want the solution to work reliably over federation, then we'd have to spec that some bits of the PUT /devices/{deviceId} data have to be shared over federation, whereas they currently are not, in general.
@BillCarsonFr questioned if working reliably over federation was necessary, particularly if it was less effort to implement. I think that, if anything, the PUT /devices/{deviceId} approach is likely to need more changes (at least on the server side), even without federation support.

Nov 27 '24 12:11 richvdh