element-meta
element-meta copied to clipboard
Posthog: Stop double reporting UTDs events when app is relaunched.
Currently we don't persist the list of UTD events that has been reported (in memory only). This impacts the accuracy of metrics in posthog.
We need to find a way to stop doing that:
- Client side by persisting the list of events reported (beware of storage as this can only grow forever?)
- Maybe possible to have the Posthog ingestion pipeline deduplicate? Not very confident as we don't have access to event_id and timestamp
- Create a posthog plugin (
data_in
) that would allow to deduplicate based on some properties in the captured event (hash of event_id?) Some plugins are close but not exactly that
### Tasks
- [ ] https://github.com/element-hq/element-android/issues/8800
- [ ] (web) https://github.com/element-hq/element-web/issues/27421
- [ ] (EX) https://github.com/matrix-org/matrix-rust-sdk/issues/3374
The graph we are focusing on are Unique UTD, so stop sending double reports won't impact the graphs.
Related https://github.com/element-hq/element-meta/issues/2332
@kegsay We depriosrised because we only focus on unique errors. But double reporting will be annoying if we add the new properties. For example a permanent UTD would be reported several times with different eventLocalAgeAtDecryptionFailure
or userTrustsOwnIdentity
, maybe it will then be more annoying to analyse the data.
Wouldn't we still be double-reporting between clients/sessions, especially as in pseudonymous mode the analytics all link to the same ID for cross-client matching.
Yes, but I think we sort-of want that. If a user has multiple clients, and none of them can decrypt it, then that's different from if only one of them can't decrypt it. There might be something better that we can do in the case of multiple clients not being able to decrypt a single event. But as far as this particular issue is involved, what we want to avoid is if a user restarts the client and it reports a UTD, but it's the same UTD that it reported earlier and of course it's still a UTD, but we can't distinguish that report from the client having received a new event that is another UTD.
I think this is done.