element-meta icon indicating copy to clipboard operation
element-meta copied to clipboard

Posthog: Stop double reporting UTDs events when app is relaunched.

Open BillCarsonFr opened this issue 11 months ago • 4 comments

Currently we don't persist the list of UTD events that has been reported (in memory only). This impacts the accuracy of metrics in posthog.

We need to find a way to stop doing that:

  • Client side by persisting the list of events reported (beware of storage as this can only grow forever?)
  • Maybe possible to have the Posthog ingestion pipeline deduplicate? Not very confident as we don't have access to event_id and timestamp
  • Create a posthog plugin (data_in) that would allow to deduplicate based on some properties in the captured event (hash of event_id?) Some plugins are close but not exactly that
### Tasks
- [ ] https://github.com/element-hq/element-android/issues/8800
- [ ] (web) https://github.com/element-hq/element-web/issues/27421
- [ ] (EX) https://github.com/matrix-org/matrix-rust-sdk/issues/3374

BillCarsonFr avatar Mar 11 '24 09:03 BillCarsonFr

The graph we are focusing on are Unique UTD, so stop sending double reports won't impact the graphs.

BillCarsonFr avatar Mar 11 '24 15:03 BillCarsonFr

Related https://github.com/element-hq/element-meta/issues/2332

@kegsay We depriosrised because we only focus on unique errors. But double reporting will be annoying if we add the new properties. For example a permanent UTD would be reported several times with different eventLocalAgeAtDecryptionFailure or userTrustsOwnIdentity, maybe it will then be more annoying to analyse the data.

BillCarsonFr avatar Mar 15 '24 09:03 BillCarsonFr

Wouldn't we still be double-reporting between clients/sessions, especially as in pseudonymous mode the analytics all link to the same ID for cross-client matching.

t3chguy avatar May 01 '24 21:05 t3chguy

Yes, but I think we sort-of want that. If a user has multiple clients, and none of them can decrypt it, then that's different from if only one of them can't decrypt it. There might be something better that we can do in the case of multiple clients not being able to decrypt a single event. But as far as this particular issue is involved, what we want to avoid is if a user restarts the client and it reports a UTD, but it's the same UTD that it reported earlier and of course it's still a UTD, but we can't distinguish that report from the client having received a new event that is another UTD.

uhoreg avatar May 01 '24 21:05 uhoreg

I think this is done.

richvdh avatar Jun 20 '24 10:06 richvdh