opentelemetry-specification
opentelemetry-specification copied to clipboard
Context propagation to client instrumentation
Changes
This spec change adds the notion of Client Propagators, needed for the use-cases described at #3811. It's not clear to me whether this change warrants an OTEP, and I'll gladly create one if needed.
Edit: here's a prototype showing how this can be used in the Go SDK: https://github.com/jpkrohling/opentelemetry-go/commit/c60efdff144f94da195a687ed2c1273c2193a6e5#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693
Related issues:
- #3811
- https://github.com/w3c/trace-context/issues/556
Checklist
- [x] Related issues
- [ ] Related OTEP(s): none
- [x] Links to the prototypes (when adding or changing features): https://github.com/jpkrohling/opentelemetry-go/commit/c60efdff144f94da195a687ed2c1273c2193a6e5#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693
- [x]
CHANGELOG.mdfile updated for non-trivial changes - [x]
spec-compliance-matrix.mdupdated if necessary
Signed-off-by: Juraci Paixão Kröhling [email protected]
cc @cedricziel, @pyohannes
I personally would find more intuitive something on the lines of PropagatorToCaller, CallerBackwardsPropagator or BackwardsPropagator
It is effectively a one-hop mechanism
Good call, I think it was implied in one of the paragraphs, but explicit is better than implicit. I'm careful here to not prescribe too much, leaving room for the concrete implementations. I could add one example using Server-Timing, making sure to note that it's not an example that might end up in an implementation (given how things can still move on the W3C TraceContext spec).
Of the suggested names, I think I prefer BackwardsPropagator the most. I'm just not sure whether it should be "backward" or "backwards"; perhaps a native speaker can help us here?
It is effectively a one-hop mechanism, since it is not possible to implement a multi-hop "backtrace" solution (e.g. sampling on errors) given the existing Context APIs.
@yurishkuro Would backpropagation im combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind?
With deferred sampling in place and the client changing its sampling decision based on back-propagated context, sampling decisions can eventually be back-propagated through multiple hops.
Of the suggested names, I think I prefer BackwardsPropagator the most.
I would rather go for ResponsePropagator, to emphasize one-hop, not a transitive backwards prop.
@yurishkuro Would back propagation in combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind?
@pyohannes yes, in theory, but there is no specification how that would work.
@jsuereth:
This specification change needs stability markers (and clearly outline new sections from existing stable sections).
Can you please take another look? I added stability markers to the new sections, but I think it's important to mention the new response propagator elsewhere. I marked the propagator as experimental there in one place but not in the overview. If you think it's not enough, let me know: I think the current state should be clear enough to determine what's experimental and what's stable, while still maintaining the readability.
I would make the one-hop nature explicit
I added a paragraph about this, let me know if you think it's sufficient:
The trace context obtained from response propagators are meant to be consumed only by the caller and shouldn't be propagated further back.
I think this is ready for a final review. I added a small PoC using the Go SDK as base, showing how this could work:
https://github.com/jpkrohling/opentelemetry-go/commit/c60efdff144f94da195a687ed2c1273c2193a6e5#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693
There's one open question, which I think we can handle as part of a follow-up discussion: what should a client do when there's already a trace context (like in a server<-server response). IMO, the incoming traceresponse should only be used to add a span link, but I'm not sure where this should be specified (if at all).
This PR was marked stale due to lack of activity. It will be closed in 7 days.
@jsuereth, @trask, @yurishkuro and @pyohannes, could you please review this one again?
This looks great to me! Thanks so much @jpkrohling for spearheading this!
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Just commenting here to keep this alive.
so it's clear that ResponsePropagator is introduced as an experiment.
@jmacd, do you recommend an extra statement somewhere? I would have guessed that the "Experimental" markers would be sufficient for this, but I'm certainly open to making a stronger statement elsewhere.
The specification itself is much cleared up and easier to follow. Thanks for all the fixes and explicit stability markers.
The current prototype only include the propagator itself but not interaction with instrumentation. I would like to see a prototype of this against an HTTP framework via instrumentation, including showcasing some of the configuration portions of the specification (enabling/disabled-by-default).
Basically an end-to-end demo including showing the value of the data itself.
Getting into the specification means expansion into multiple languages and attempting to get users leveraging the feature. I'm not sure we're there yet (but I'm willing to be overruled here).
Its clear that more work is needed to research this area so that this spec could be iterated on. Is there a project plan and some group of people who are intending on continuing to drive this?
Count me in to that working group, please.
the OTel Java agent may be a good place to prototype this as there's already a hook to add response headers to all of the HTTP server instrumentation (which I believe is already being used by the Splunk distro to do something similar to this PR)
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Status update: I'm totally swamped right now, and asked for the Faro Web SDK folks here at Grafana Labs to help me with the PoC (cc @codecapitano). @johnbley , I shared your offer with them as well, I'd love to see some collaboration there :-)
Since I don't have any new events on my schedule until OTel Community Day, I should be able to clear my queue by the time the PoC gets done.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
@jpkrohling Client Instrumentation SIG would be a good place to take this forward. We discussed this topic in the Client SIG with @johnbley a few weeks ago and we realized a couple things -
- That the traceresponse spec supports only span linking across browser span and server span, and not help with a parent-child relationship.
- The current document-load instrumentation suggests setting up a parent-child relationship in a single trace by having the server send its span's parent context in a meta tag by name
traceparent.
While both have their pros and cons, it would be good to discuss how we want to take this forward.
Based on this latest information, I'm marking this as draft.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
Small update: @jpkrohling handed the task over and we (@codecapitano, @cedricziel) will continue the PoC work.
This PR was marked stale due to lack of activity. It will be closed in 7 days.
This PR was marked stale due to lack of activity. It will be closed in 7 days.