opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

fix(tracing): make spans resilient to performance clock drift

Open dyladan opened this issue 1 year ago • 2 comments

This is a simpler alternative to #3384

Described by the comment here: https://github.com/open-telemetry/opentelemetry-js/issues/3279#issuecomment-1320064182

Fixes #3355

dyladan avatar Nov 21 '22 18:11 dyladan

Codecov Report

Merging #3434 (830c799) into main (2dcc898) will increase coverage by 0.02%. The diff coverage is 96.96%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3434      +/-   ##
==========================================
+ Coverage   93.78%   93.80%   +0.02%     
==========================================
  Files         249      249              
  Lines        7621     7637      +16     
  Branches     1588     1589       +1     
==========================================
+ Hits         7147     7164      +17     
+ Misses        474      473       -1     
Impacted Files Coverage Δ
packages/opentelemetry-sdk-trace-base/src/Span.ts 98.58% <95.65%> (-1.42%) :arrow_down:
...s/opentelemetry-instrumentation-fetch/src/fetch.ts 97.02% <100.00%> (+0.01%) :arrow_up:
...emetry-instrumentation-xml-http-request/src/xhr.ts 97.59% <100.00%> (+0.01%) :arrow_up:
packages/opentelemetry-core/src/common/time.ts 98.55% <100.00%> (+2.96%) :arrow_up:
...-trace-base/src/platform/node/RandomIdGenerator.ts 93.75% <0.00%> (+6.25%) :arrow_up:

codecov[bot] avatar Nov 21 '22 18:11 codecov[bot]

like previous pr, codesandbox with the changes, artificial drift and most common instrumentations (with already fixed the _performanceOffset issue in first comment)

t2t2 avatar Nov 22 '22 18:11 t2t2

The test (Span # should have an entered time for event) is unstable with the change.

legendecas avatar Jan 09 '23 03:01 legendecas

The test (Span # should have an entered time for event) is unstable with the change.

Yeah I see that. I'm trying to work on it now. ~~I think it was related to changing span end to be a Date~~

edit: I see the test relates to events not end. Still looking into it

dyladan avatar Jan 09 '23 13:01 dyladan

Looks like the test was providing time 0 to startTime for the span which caused the span start to use Date.now to generate a start time. It then provided 123 to the event time which caused a drift correction. The test was flaky because it depended on the drift correction being millisecond accurate.

dyladan avatar Jan 09 '23 16:01 dyladan

@legendecas should be stable now

dyladan avatar Jan 09 '23 16:01 dyladan

@t2t2 I updated the web instrumentations to use Date instead of performance.now to get span end times which I think should address your concerns. I would appreciate your review because you've been very helpful. Hopefully we can get this merged soon.

dyladan avatar Jan 09 '23 19:01 dyladan