opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

feat: allow Resource to be created with some attributes resolving asynchronously

Open aabmass opened this issue 3 years ago • 9 comments
trafficstars

Which problem is this PR solving?

Fixes #2912

Short description of the changes

  • In @opentelemetry/sdk-node
    • Adds an optional Promise<ResourceAttributes> to the Resource constructor which can asynchronously resolve some attributes. When this promise resolves, the attributes will be merged with the "synchronous" attributes.
    • Deprecated detectResources() in favor of a new method detectResourcesSync() which defers promises into the resource to be resolved later
    • Update Detector.detect() signature to allow it to be synchronous and updated documentation that the synchronous variant should be used
  • In @opentelemetry/sdk-node, updated NodeSDK.start() and its dependencies to now be synchronous (doesn't return a promise). This is a breaking change, but that package is experimental.
  • Updated BatchSpanProcessorBase and SimpleSpanProcessor to await the resource promise before calling exporters, if the resource has asynchronous attributes AND they have not already resolved. I.e. if there is no promise in the Resource (the case for existing detectors and code), the behavior is unchanged.

There may be some test breakages in contrib because of the nature of asynchronous tests on fire-and-forget span.end(). Users shouldn't see any behavior change.

Type of change

Please delete options that are not relevant.

  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) I don't think so?

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • [x] Existing tests pass with no modification
  • [x] Added new tests to Resource.test.ts
  • [x] Added detect-resources.test.ts

Checklist:

  • [x] Followed the style guidelines of this project
  • [x] Unit tests have been added
  • [x] Documentation has been updated

aabmass avatar Apr 27 '22 16:04 aabmass

Codecov Report

Merging #2933 (8435693) into main (3cc40d7) will decrease coverage by 1.32%. The diff coverage is 96.55%.

:exclamation: Current head 8435693 differs from pull request most recent head eb6446f. Consider uploading reports for the commit eb6446f to get more accurate results

@@            Coverage Diff             @@
##             main    #2933      +/-   ##
==========================================
- Coverage   92.64%   91.32%   -1.33%     
==========================================
  Files         187       68     -119     
  Lines        6169     1878    -4291     
  Branches     1303      399     -904     
==========================================
- Hits         5715     1715    -4000     
+ Misses        454      163     -291     
Impacted Files Coverage Δ
packages/opentelemetry-resources/src/Resource.ts 97.50% <96.55%> (-2.50%) :arrow_down:
packages/opentelemetry-resources/karma.worker.js 0.00% <0.00%> (-100.00%) :arrow_down:
...ckages/opentelemetry-sdk-trace-web/karma.worker.js 0.00% <0.00%> (-100.00%) :arrow_down:
...s/opentelemetry-core/src/platform/node/sdk-info.ts 0.00% <0.00%> (-100.00%) :arrow_down:
...opentelemetry-core/src/platform/node/globalThis.ts 0.00% <0.00%> (-100.00%) :arrow_down:
...pentelemetry-core/src/platform/node/performance.ts 0.00% <0.00%> (-100.00%) :arrow_down:
...emetry-api-metrics/src/platform/node/globalThis.ts 0.00% <0.00%> (-100.00%) :arrow_down:
...lemetry-resources/src/detectors/ProcessDetector.ts 31.81% <0.00%> (-68.19%) :arrow_down:
...perimental/packages/otlp-exporter-base/src/util.ts 79.41% <0.00%> (-17.65%) :arrow_down:
...ckages/opentelemetry-exporter-zipkin/src/zipkin.ts 84.48% <0.00%> (-15.52%) :arrow_down:
... and 120 more

codecov[bot] avatar Apr 27 '22 16:04 codecov[bot]

I really like this proposal - to propagate the async task down to resource attribute consumers. This is a backward-compatible (I think) and very clean way to address the issue.

The promise means that every time we need to access this resource attributes, the caller will have to await to get the full attributes values. I wonder if that is a constraint that is easy to fulfill.

For example, in aspecto's distro we read the resource attributes in a span processor and use the values to apply filtering rules on span attributes. This is done in the onEnd invocation which is a sync function of course. I guess we could wrap it with some mechanism that aggregates the processor spans until the resource is resolved, but that is not trivial.

Another possible issue is that we cannot have any guarantee on how long it will take for the promise to resolve. Until it resolves, spans might be piling up in the exporter export invocations, all waiting for the resource to be resolved so they can be sent on the wire.

Really interesting approach, looking forward to discussing this more.

blumamir avatar Apr 28 '22 00:04 blumamir

According to spec resource is immutable. I think delaying the read until first use in e.g. exporter sounds to me like it's still spec conform. I don't think the await can be avoided by any wrapping/API we add in common code. A consumer awaits on first use and may store the result for future use (maybe already serialized/compressed/... to avoid to redo this on every export).

Flarna avatar Apr 28 '22 05:04 Flarna

I wonder if top level await + esm preload would be able to achieve this as well? @aabmass / @Flarna , do you know?

EDIT

Played around with it a bit. For ESM, adding an loader module would work and is able to asyncronously wait for stuff to resolve. The thing is, it's only for ESM.

For CJS, there's no TLA, so we still need a solution for today.

rauno56 avatar May 11 '22 11:05 rauno56

Played around with it a bit. For ESM, adding an loader module would work and is able to asyncronously wait for stuff to resolve. The thing is, it's only for ESM.

That's quite cool. Is the limitation of only a single loader module removed yet?

dyladan avatar May 11 '22 12:05 dyladan

I think top Level await can be used but as far as I know it works only in ECMAScript module mode not for commonJs. I requires node.js 14.8 as far as I know. Not sure about browsers. Once we are in module mode we have to use loaders which are still experimental.

The limitation to have only single loader is gone on node master (see here), backport to 18.x has not yet happened as some other breaking changes in loaders are planned and target is to combine all breaking changes in a single 18.x release.

Flarna avatar May 12 '22 17:05 Flarna

I guess now span processors are expected to handle these async promises? We already have some users which have implemented custom span processors. I guess this is considered breaking for them? What does the spec say about breaking changes for sdk interface implementers?

I would consider this also as a breaking change so this would be semver major.

Flarna avatar Jun 22 '22 08:06 Flarna

This PR actually introduces another problem from the specification:

OnEnd(Span)

OnEnd is called after a span is ended (i.e., the end timestamp is already set). This method MUST be called synchronously within the Span.End() API, therefore it should not block or throw an exception.

https://github.com/open-telemetry/opentelemetry-specification/blob/9abbdd39d0b35f635f833f072013431da419894e/specification/trace/sdk.md#onendspan

dyladan avatar Jun 22 '22 17:06 dyladan

This PR actually introduces another problem from the specification:

OnEnd(Span)

OnEnd is called after a span is ended (i.e., the end timestamp is already set). This method MUST be called synchronously within the Span.End() API, therefore it should not block or throw an exception.

open-telemetry/opentelemetry-specification@9abbdd3/specification/trace/sdk.md#onendspan

I actually realized this isn't a problem. If it was, then the batch span processor would be not compliant.

dyladan avatar Jun 29 '22 15:06 dyladan

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Aug 29 '22 06:08 github-actions[bot]

This PR was closed because it has been stale for 14 days with no activity.

github-actions[bot] avatar Sep 12 '22 06:09 github-actions[bot]