enrich icon indicating copy to clipboard operation
enrich copied to clipboard

Common: improve derived_tstamp/implement better approximation of actual time of event

Open chuwy opened this issue 5 years ago • 5 comments
trafficstars

Currently, the derived timestamp logic is as follows:

derived_tstamp = collector_tstamp - (dvce_sent_tstamp - dvce_created_tstamp)

As I understand it, the logic is that the collector timestamp is the first time we confidently establish a measurement of actual time, so we take that as the basis, and subtract lag between creating and sending the event.

This leads to unusual values in the case of events which are sent late due to connectivity issues.

A suggested improvement on this logic is as follows:

  • Assume a reasonable time between collector_tstamp and dvce_sent_tstamp (I think this would be within a second or a few seconds)
  • Calculate difference between collector_tstamp and dvce_sent_tstamp, if it's above that number - this gives you how far off the actual time of day the devices clock is.
  • Offset the dvce_created_tstamp by that delta to produce a more consistently accurate measurement of actual time of event.

I believe this is better because:

  • no matter what, the dvce_created_tstamp always gives you when the events happened in relation to each other.
  • Collector and dvce_sent timestamps should be very very close to each other. The difference between these values is therefore presumed to be an accurate measure of the device clock's offset from actual time UTC
  • Therefore the dvce_created_tstamp adjusted by this offset returns a good approximation of the actual time of event.

There's still scope for inaccuracies in the rare case that the device clock changes between events, but avoids strange values resulting from lag between sending consecutive events.

chuwy avatar Jun 19 '20 15:06 chuwy

Moved from https://github.com/snowplow/snowplow/issues/3921

chuwy avatar Jun 19 '20 15:06 chuwy

So that the context isn't lost - the issue which drove this ticket turned out not to be down to the logic behind derived_tstamp, but the fact that the logic doesn't account for a changing clock or timezone in between dvce_created and dvce_sent tstamps.

Imagine, for example, a user browsing an app in offline mode while on a flight, with a tz of UTC. When that user comes back online, tz changes to UTC +5. Then they reopen the app at some point, and the data is sent. derived_tstamp will be wrong since(dvce_sent_tstamp - dvce_created_tstamp) is off by 5 hours.

As Alex said in the original ticket, if we want to address this behaviour:

We could probably mostly fix this by tracking device TZ before sending and then adjusting as needed (i.e. manual correction to dvce_tstamp in tracker). Only mostly fix because we can't detect other changes like a manual clock adjustment. But they must be extremely rare on mobile.

colmsnowplow avatar Jun 19 '20 16:06 colmsnowplow

Don't worry @colmsnowplow. The script is cross-linking all issues and one will be able to follow back to original conversation. There will be ~300 migrated tickets in following hour.

chuwy avatar Jun 19 '20 16:06 chuwy

I am not sure if this is the correct ticket or not, but i am seeing some very strange timestamps from the android tracker

dvce_created_tstamp dvce_sent_tstamp collector_tstamp
2000-12-03T22:24:08.192Z 2020-07-19T22:26:59.457Z 2020-07-19T22:27:02.897Z
2000-12-06T01:31:08.195Z 2020-07-22T03:23:02.173Z 2020-07-22T03:23:04.136Z
2000-12-02T19:43:24.928Z 2020-07-22T03:23:02.383Z 2020-07-22T03:23:04.411Z
2000-12-02T19:44:32.857Z 2020-07-22T03:23:02.383Z 2020-07-22T03:23:04.411Z
2000-12-05T00:39:01.046Z 2020-07-22T03:23:02.173Z 2020-07-22T03:23:04.136Z
2000-12-06T01:12:18.353Z 2020-07-22T03:23:02.173Z 2020-07-22T03:23:04.136Z

camerondavison avatar Jul 22 '20 17:07 camerondavison

Hi @camerondavison! The best place for that kind of issue would be the Android tracker repo, since dvce_ tstamps are all created in the tracker.

However I would say that actually for this case, it might be better to open a thread on discourse.snowplowanalytics.com - it's not immediately obvious to me that this is a bug in the tracker, so it might be helpful to discuss it there and dig into the details. :)

colmsnowplow avatar Jul 22 '20 17:07 colmsnowplow