trafficstars

Relates to #3279 particularly https://github.com/open-telemetry/opentelemetry-js/issues/3279#issuecomment-1302575571

Fixes #3355

Nov 04 '22 20:11 dyladan

Codecov Report

Merging #3384 (eb4c2ae) into main (bd56547) will decrease coverage by 0.70%. The diff coverage is 91.07%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3384      +/-   ##
==========================================
- Coverage   93.27%   92.57%   -0.71%     
==========================================
  Files         247      132     -115     
  Lines        7348     3152    -4196     
  Branches     1512      656     -856     
==========================================
- Hits         6854     2918    -3936     
+ Misses        494      234     -260

Impacted Files	Coverage Δ
packages/opentelemetry-core/src/common/time.ts	`95.45% <75.00%> (+0.06%)`	:arrow_up:
packages/opentelemetry-sdk-trace-base/src/Span.ts	`97.87% <93.75%> (-2.13%)`	:arrow_down:
api/src/platform/node/globalThis.ts	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...s/opentelemetry-core/src/platform/node/sdk-info.ts	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...opentelemetry-core/src/platform/node/globalThis.ts	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...pentelemetry-core/src/platform/node/performance.ts	`0.00% <0.00%> (-100.00%)`	:arrow_down:
.../packages/api-logs/src/platform/node/globalThis.ts	`0.00% <0.00%> (-100.00%)`	:arrow_down:
...lemetry-resources/src/detectors/ProcessDetector.ts	`31.81% <0.00%> (-68.19%)`	:arrow_down:
...perimental/packages/otlp-exporter-base/src/util.ts	`78.94% <0.00%> (-15.79%)`	:arrow_down:
...ckages/opentelemetry-exporter-zipkin/src/zipkin.ts	`84.48% <0.00%> (-15.52%)`	:arrow_down:
... and 119 more

Nov 10 '22 20:11 codecov[bot]

Made a sandbox with this patch, most frequently used otel web instrumentations (document-load, user-interaction, xhr and fetch) and an artificial time drift simulator:

https://codesandbox.io/s/modest-fermat-4eu09e?file=/src/otel-fixes.ts

It doesn't improve this test case a lot (user interaction's in right time scale, as it's auto + auto), pretty much all cases of performance timestamps are still affected by drift

document load: performance time start + performance time end:

fetch: auto start + performance time end (duration becomes negative -> set to 0)

Nov 14 '22 16:11 t2t2

It doesn't improve this test case a lot (user interaction's in right time scale, as it's auto + auto), pretty much all cases of performance timestamps are still affected by drift

These instrumentations all use auto start and manual end times.

duration becomes negative -> set to 0

If the duration is negative it should be getting the drift offset correction applied, not just set to 0. If you look at the diagram in the linked comment there is no situation where it should be set to 0, so there must be some bad code path.

Nov 14 '22 19:11 dyladan

Updated codesandbox with pr changes as of now

This was the root of the 0-duration spans in your example too

Yep fixed on auto start + perf end, however span events (input: perf timestamp) also need correcting (now they end up outside the span w/ drift)

These instrumentations all use auto start and manual end times.

document-load spans don't, as they use manual (performance API) times for both start and end, and these still get drifted:

Nov 15 '22 09:11 t2t2

document-load spans don't, as they use manual (performance API) times for both start and end, and these still get drifted

I'm not sure what we can do there. If the user provides manual start and end times we can't correct drift because they may be intentionally describing a span that happened in the past.

Nov 15 '22 12:11 dyladan

If the user provides manual start and end times we can't correct drift because they may be intentionally describing a span that happened in the past.

There is currently an assumption that any numeric value lower than timeOrigin is a performance timestamp:

https://github.com/open-telemetry/opentelemetry-js/blob/1b7246abb49164ec5c2cff7b66956c454f230fed/packages/opentelemetry-core/src/common/time.ts#L88

So it's actually impossible to create spans in the past (w/ accounting for drift) anyway:

Nov 15 '22 13:11 t2t2

I'm not sure what we can do there. If the user provides manual start and end times we can't correct drift because they may be intentionally describing a span that happened in the past.

We can document how we create timestamps to allow users to keep in sync. Maybe provide some helper APIs. And use this mechanism at least in the instrumentations hosted here.

Nov 15 '22 14:11 Flarna

If the user provides manual start and end times we can't correct drift because they may be intentionally describing a span that happened in the past.

There is currently an assumption that any numeric value lower than timeOrigin is a performance timestamp:

https://github.com/open-telemetry/opentelemetry-js/blob/1b7246abb49164ec5c2cff7b66956c454f230fed/packages/opentelemetry-core/src/common/time.ts#L88

So it's actually impossible to create spans in the past (w/ accounting for drift) anyway:

Hmm so there is. That is disappointing and actually strikes me as a bug. The time code in core attempts to do way too much magic IMO and really limits us. That specific one can probably be fixed by changing it so that the heuristic is < performance.now() instead of < performance.timeOrigin. It would at least limit impossible times to epoch + time the page has been open.

I'm not sure what we can do there. If the user provides manual start and end times we can't correct drift because they may be intentionally describing a span that happened in the past.

We can document how we create timestamps to allow users to keep in sync. Maybe provide some helper APIs. And use this mechanism at least in the instrumentations hosted here.

I agree here documentation is a bare minimum and providing a helper API of some kind to correct drift would be a good idea

Nov 15 '22 20:11 dyladan

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

Jan 23 '23 06:01 github-actions[bot]

Superseded by https://github.com/open-telemetry/opentelemetry-js/pull/3434.

Jan 24 '23 15:01 legendecas

opentelemetry-js
opentelemetry-js copied to clipboard

fix(tracing): make spans resilient to performance clock drift

Codecov Report

opentelemetry-js opentelemetry-js copied to clipboard

fix(tracing): make spans resilient to performance clock drift

Codecov Report

opentelemetry-js
opentelemetry-js copied to clipboard