enrich
enrich copied to clipboard
Common: improve derived_tstamp/implement better approximation of actual time of event
Currently, the derived timestamp logic is as follows:
derived_tstamp = collector_tstamp - (dvce_sent_tstamp - dvce_created_tstamp)
As I understand it, the logic is that the collector timestamp is the first time we confidently establish a measurement of actual time, so we take that as the basis, and subtract lag between creating and sending the event.
This leads to unusual values in the case of events which are sent late due to connectivity issues.
A suggested improvement on this logic is as follows:
- Assume a reasonable time between collector_tstamp and dvce_sent_tstamp (I think this would be within a second or a few seconds)
- Calculate difference between collector_tstamp and dvce_sent_tstamp, if it's above that number - this gives you how far off the actual time of day the devices clock is.
- Offset the dvce_created_tstamp by that delta to produce a more consistently accurate measurement of actual time of event.
I believe this is better because:
- no matter what, the dvce_created_tstamp always gives you when the events happened in relation to each other.
- Collector and dvce_sent timestamps should be very very close to each other. The difference between these values is therefore presumed to be an accurate measure of the device clock's offset from actual time UTC
- Therefore the dvce_created_tstamp adjusted by this offset returns a good approximation of the actual time of event.
There's still scope for inaccuracies in the rare case that the device clock changes between events, but avoids strange values resulting from lag between sending consecutive events.
Moved from https://github.com/snowplow/snowplow/issues/3921
So that the context isn't lost - the issue which drove this ticket turned out not to be down to the logic behind derived_tstamp, but the fact that the logic doesn't account for a changing clock or timezone in between dvce_created and dvce_sent tstamps.
Imagine, for example, a user browsing an app in offline mode while on a flight, with a tz of UTC. When that user comes back online, tz changes to UTC +5. Then they reopen the app at some point, and the data is sent. derived_tstamp will be wrong since(dvce_sent_tstamp - dvce_created_tstamp) is off by 5 hours.
As Alex said in the original ticket, if we want to address this behaviour:
We could probably mostly fix this by tracking device TZ before sending and then adjusting as needed (i.e. manual correction to dvce_tstamp in tracker). Only mostly fix because we can't detect other changes like a manual clock adjustment. But they must be extremely rare on mobile.
Don't worry @colmsnowplow. The script is cross-linking all issues and one will be able to follow back to original conversation. There will be ~300 migrated tickets in following hour.
I am not sure if this is the correct ticket or not, but i am seeing some very strange timestamps from the android tracker
| dvce_created_tstamp | dvce_sent_tstamp | collector_tstamp |
|---|---|---|
| 2000-12-03T22:24:08.192Z | 2020-07-19T22:26:59.457Z | 2020-07-19T22:27:02.897Z |
| 2000-12-06T01:31:08.195Z | 2020-07-22T03:23:02.173Z | 2020-07-22T03:23:04.136Z |
| 2000-12-02T19:43:24.928Z | 2020-07-22T03:23:02.383Z | 2020-07-22T03:23:04.411Z |
| 2000-12-02T19:44:32.857Z | 2020-07-22T03:23:02.383Z | 2020-07-22T03:23:04.411Z |
| 2000-12-05T00:39:01.046Z | 2020-07-22T03:23:02.173Z | 2020-07-22T03:23:04.136Z |
| 2000-12-06T01:12:18.353Z | 2020-07-22T03:23:02.173Z | 2020-07-22T03:23:04.136Z |
Hi @camerondavison! The best place for that kind of issue would be the Android tracker repo, since dvce_ tstamps are all created in the tracker.
However I would say that actually for this case, it might be better to open a thread on discourse.snowplowanalytics.com - it's not immediately obvious to me that this is a bug in the tracker, so it might be helpful to discuss it there and dig into the details. :)