opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Context propagation to client instrumentation

Open jpkrohling opened this issue 1 year ago • 13 comments

Changes

This spec change adds the notion of Client Propagators, needed for the use-cases described at #3811. It's not clear to me whether this change warrants an OTEP, and I'll gladly create one if needed.

Edit: here's a prototype showing how this can be used in the Go SDK: https://github.com/jpkrohling/opentelemetry-go/commit/c60efdff144f94da195a687ed2c1273c2193a6e5#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693

Related issues:

  • #3811
  • https://github.com/w3c/trace-context/issues/556

Checklist

  • [x] Related issues
  • [ ] Related OTEP(s): none
  • [x] Links to the prototypes (when adding or changing features): https://github.com/jpkrohling/opentelemetry-go/commit/c60efdff144f94da195a687ed2c1273c2193a6e5#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693
  • [x] CHANGELOG.md file updated for non-trivial changes
  • [x] spec-compliance-matrix.md updated if necessary

Signed-off-by: Juraci Paixão Kröhling [email protected]

jpkrohling avatar Jan 15 '24 13:01 jpkrohling

cc @cedricziel, @pyohannes

jpkrohling avatar Jan 15 '24 13:01 jpkrohling

I personally would find more intuitive something on the lines of PropagatorToCaller, CallerBackwardsPropagator or BackwardsPropagator

mmanciop avatar Jan 15 '24 18:01 mmanciop

It is effectively a one-hop mechanism

Good call, I think it was implied in one of the paragraphs, but explicit is better than implicit. I'm careful here to not prescribe too much, leaving room for the concrete implementations. I could add one example using Server-Timing, making sure to note that it's not an example that might end up in an implementation (given how things can still move on the W3C TraceContext spec).

jpkrohling avatar Jan 16 '24 08:01 jpkrohling

Of the suggested names, I think I prefer BackwardsPropagator the most. I'm just not sure whether it should be "backward" or "backwards"; perhaps a native speaker can help us here?

jpkrohling avatar Jan 16 '24 08:01 jpkrohling

It is effectively a one-hop mechanism, since it is not possible to implement a multi-hop "backtrace" solution (e.g. sampling on errors) given the existing Context APIs.

@yurishkuro Would backpropagation im combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind?

With deferred sampling in place and the client changing its sampling decision based on back-propagated context, sampling decisions can eventually be back-propagated through multiple hops.

pyohannes avatar Jan 16 '24 09:01 pyohannes

Of the suggested names, I think I prefer BackwardsPropagator the most.

I would rather go for ResponsePropagator, to emphasize one-hop, not a transitive backwards prop.

yurishkuro avatar Jan 16 '24 20:01 yurishkuro

@yurishkuro Would back propagation in combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind?

@pyohannes yes, in theory, but there is no specification how that would work.

yurishkuro avatar Jan 16 '24 22:01 yurishkuro

@jsuereth:

This specification change needs stability markers (and clearly outline new sections from existing stable sections).

Can you please take another look? I added stability markers to the new sections, but I think it's important to mention the new response propagator elsewhere. I marked the propagator as experimental there in one place but not in the overview. If you think it's not enough, let me know: I think the current state should be clear enough to determine what's experimental and what's stable, while still maintaining the readability.

jpkrohling avatar Jan 23 '24 13:01 jpkrohling

I would make the one-hop nature explicit

I added a paragraph about this, let me know if you think it's sufficient:

The trace context obtained from response propagators are meant to be consumed only by the caller and shouldn't be propagated further back.

jpkrohling avatar Jan 23 '24 13:01 jpkrohling

I think this is ready for a final review. I added a small PoC using the Go SDK as base, showing how this could work:

https://github.com/jpkrohling/opentelemetry-go/commit/c60efdff144f94da195a687ed2c1273c2193a6e5#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693

There's one open question, which I think we can handle as part of a follow-up discussion: what should a client do when there's already a trace context (like in a server<-server response). IMO, the incoming traceresponse should only be used to add a span link, but I'm not sure where this should be specified (if at all).

jpkrohling avatar Feb 01 '24 15:02 jpkrohling

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Feb 10 '24 03:02 github-actions[bot]

@jsuereth, @trask, @yurishkuro and @pyohannes, could you please review this one again?

jpkrohling avatar Feb 13 '24 10:02 jpkrohling

This looks great to me! Thanks so much @jpkrohling for spearheading this!

johnbley avatar Feb 20 '24 16:02 johnbley

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Mar 22 '24 03:03 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Just commenting here to keep this alive.

johnbley avatar Mar 22 '24 11:03 johnbley

so it's clear that ResponsePropagator is introduced as an experiment.

@jmacd, do you recommend an extra statement somewhere? I would have guessed that the "Experimental" markers would be sufficient for this, but I'm certainly open to making a stronger statement elsewhere.

jpkrohling avatar Apr 04 '24 08:04 jpkrohling

The specification itself is much cleared up and easier to follow. Thanks for all the fixes and explicit stability markers.

The current prototype only include the propagator itself but not interaction with instrumentation. I would like to see a prototype of this against an HTTP framework via instrumentation, including showcasing some of the configuration portions of the specification (enabling/disabled-by-default).

Basically an end-to-end demo including showing the value of the data itself.

Getting into the specification means expansion into multiple languages and attempting to get users leveraging the feature. I'm not sure we're there yet (but I'm willing to be overruled here).

jsuereth avatar Apr 04 '24 16:04 jsuereth

Its clear that more work is needed to research this area so that this spec could be iterated on. Is there a project plan and some group of people who are intending on continuing to drive this?

Count me in to that working group, please.

johnbley avatar Apr 11 '24 17:04 johnbley

the OTel Java agent may be a good place to prototype this as there's already a hook to add response headers to all of the HTTP server instrumentation (which I believe is already being used by the Splunk distro to do something similar to this PR)

trask avatar Apr 12 '24 00:04 trask

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Apr 19 '24 03:04 github-actions[bot]

Status update: I'm totally swamped right now, and asked for the Faro Web SDK folks here at Grafana Labs to help me with the PoC (cc @codecapitano). @johnbley , I shared your offer with them as well, I'd love to see some collaboration there :-)

Since I don't have any new events on my schedule until OTel Community Day, I should be able to clear my queue by the time the PoC gets done.

jpkrohling avatar Apr 19 '24 12:04 jpkrohling

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Apr 27 '24 03:04 github-actions[bot]

@jpkrohling Client Instrumentation SIG would be a good place to take this forward. We discussed this topic in the Client SIG with @johnbley a few weeks ago and we realized a couple things -

  1. That the traceresponse spec supports only span linking across browser span and server span, and not help with a parent-child relationship.
  2. The current document-load instrumentation suggests setting up a parent-child relationship in a single trace by having the server send its span's parent context in a meta tag by name traceparent.

While both have their pros and cons, it would be good to discuss how we want to take this forward.

scheler avatar Apr 29 '24 22:04 scheler

Based on this latest information, I'm marking this as draft.

jpkrohling avatar Apr 30 '24 08:04 jpkrohling

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar May 08 '24 03:05 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar May 16 '24 03:05 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar May 29 '24 03:05 github-actions[bot]

Small update: @jpkrohling handed the task over and we (@codecapitano, @cedricziel) will continue the PoC work.

codecapitano avatar May 29 '24 07:05 codecapitano

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Jun 06 '24 03:06 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions[bot] avatar Jun 15 '24 03:06 github-actions[bot]