enrich
enrich copied to clipboard
derived_tstamp incorrect result with long delay between dvce_created_tstamp and dvce_sent_tstamp
Project: Stream Enrich Version: stream-enrich-1.4.2-common-1.4.2
Expected behavior:
To have derived_tstamp be calculated correctly when the difference between dvce_created_tstamp and dvce_sent_tstamp is large (disregarding possible issues due to changes in time zone).
Actual behavior:
I was running some processing of older data and noticed a weird page view with end_tstamp before start_tstamp, after tracking the issue down to a single page ping it seems the derived_tstamp is calculated incorrectly when the difference between dvce_created_tstamp and dvce_sent_tstamp is long. Here're the timestamp fields from the page ping event:
etl_tstamp 2021-04-09 04:34:26.254 UTC
collector_tstamp 2021-04-09 04:34:25.251 UTC
dvce_created_tstamp 2021-02-11 09:35:55.807 UTC
dvce_sent_tstamp 2021-04-09 04:34:24.785 UTC
derived_tstamp 2021-02-08 09:35:56.273 UTC
The derived_tstamp should be 2021-02-11 09:35:56.273 UTC but I'd assume due to way the Period class gets subtracted from the timestamp the output three days off in this case.
Steps to reproduce:
Haven't been able to test myself yet since I don't have a Scala environment set up currently but I'd assume inputting those values into the getDerivedTimestamp method should result in the same erroneous timestamp.
@mbehm - just to rule it out, is true_tstamp set for this event?
true_tstamp is used for derived tstamp when set. It doesn't seem likely that this is what's happening, but thought I'd check. :)
Ah yeah forgot that from the timestamp fields, no true_tstamp isn't set.