routing-release icon indicating copy to clipboard operation
routing-release copied to clipboard

W3C implementation generates a new parentID for each request

Open braunsonm opened this issue 2 years ago • 13 comments

Issue

As part of this PR: https://github.com/cloudfoundry/gorouter/pull/261

W3C support was added which is great! However it seems like the functionality was decided to make new parentIDs when you receive a new request. This doesn't seem like the correct behaviour for the gorouter since it is only proxying the request and is not a member of the trace. As far as I know, gorouter does not have support for reporting to OTEL or Zipkin tracers and only acts as a helpful proxy with this header support.

https://github.com/cloudfoundry/gorouter/pull/261/files#diff-4c0749740a74100927f204e1e56dfa5334d5a43ae2a57f3d6dfafccfa7c02897R135-R156

The effect of this is traces using W3C are always disconnected when routed through the gorouter because the next hop receives and records a child span that is not connected to any known parent spans.

Affected Versions

All since this PR was merged: https://github.com/cloudfoundry/gorouter/pull/261

Traffic Diagram

HAProxy ---> Gorouter ---> App ---> Gorouter ---> App

Steps to Reproduce

  1. Have 2 endpoints with tracing
  2. Have app1 call app2 through the gorouter
  3. Notice in the trace report the spans are disconnected with the exception of being in the same trace

Expected result

ParentIDs should not be recreated, they should be transparently passed to the next hop.

Current result

ParentIDs are always recreated, disconnecting your spans from their children.

Possible Fix

See linked lines above. Next() should not be called or generate a new parentID. cc @tlwr not sure if you still help out with this project or if you could provide some rationale.

braunsonm avatar Jan 18 '23 22:01 braunsonm

I think the rationale for this was an attempt to follow the W3C specification as described here: https://www.w3.org/TR/trace-context/#a-traceparent-is-received

However gorouter should be following this part of the spec: https://www.w3.org/TR/trace-context/#alternative-processing Since it is specifically acting as a proxy in this case and not reporting its own processing spans.

Proxies or messaging middleware MAY decide not to modify the traceparent headers but remove invalid headers or add additional information to tracestate.

This is the approach that should be followed to ensure end-to-end traces are propagated correctly. This is how it currently works in the B3 implementation in the gorouter.

braunsonm avatar Jan 18 '23 22:01 braunsonm

I have not worked on anything CF related for some time, and have forgotten most of the context

If I remember correctly, my logic for this was twofold:

  • I wanted gorouter to have its own span, so I could explicitly evaluate the impact of Gorouter in the trace (rather than only implicitly)
  • something to do with route services ?

Depending on your use-case, the behaviour of gorouter should/could be different, and I have no strong opinion to changing behaviour

tlwr avatar Jan 25 '23 10:01 tlwr

👋 Hi @tlwr! It's good to see you around again.

Thank you for opening this @braunsonm, apologizes for the delayed response. Your explanations about why ParentIDs should not be recreated makes sense to me. But, as always, I am worried about breaking people who rely on previous functionality. Though perhaps it could be considered a bug?

@ebroberson has been investigating what it would take to introduce zipkin headers for component logs in CF.

❓ @ebroberson - what is your thought on this? In your ideal situation do you agree that gorouter should not be recreating the ParentID?

ameowlia avatar Jan 26 '23 22:01 ameowlia

Hi @braunsonm! Thanks for bringing this up!

The current implementation is in line with the W3C spec, since "proxies MAY choose not to modify it". To me, this implies that the default behavior would be to modify the traceparent.

I can definitely see your use case, though, and support this being a configurable property. I'd love to see this get PR'd.

ebroberson avatar Jan 27 '23 17:01 ebroberson

I disagree @ebroberson. The line MAY is contextually for cases where the proxy is also reporting its own processing, it is then optional for that component to modify the traceparent. Since the gorouter does not report spans, it should not be modifying traceparent.

If you want to read it that way, then the spec says:

Update parent-id: The value of property parent-id MUST be set to a value representing the ID of the current operation.

The "current operation" in this case is never reported to the tracer.. so I don't think this mode of operation is relevant.

I'd love to know the reason you think the gorouter should behave this way, it serves no purpose, and even worse it disconnects spans from any apps using W3C properly. This is definitely a bug.

braunsonm avatar Jan 27 '23 21:01 braunsonm

My memory is a bit hazy, but I think I was configuring gorouter access logs to log the headers and then shipped the logs off to a tracing system so I could see what gorouter was doing. With the idea to visualise within the gorouter span a route service and the upstream app

For use cases where the gorouter span is not being shipped then breaking the trace is bad

Another feature would be shipping the traces directly without needing a log watcher

Met vriendelijke groet,

Toby Lorne

Sent from Proton Mail for iOS

Op vr 27 jan. , Braunson @.***> schreef:

I disagree @.***(https://github.com/ebroberson). The line MAY is specifically for cases where the proxy is also reporting its own processing, it is then optional for that component to modify the traceparent. Since the gorouter does not report spans, it should not be modifying traceparent

Please tell me the reason you think the gorouter should behave this way, it serves no purpose, and even worse it disconnects spans from any apps using W3C properly. This is definitely a bug.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

tlwr avatar Jan 28 '23 15:01 tlwr

References to upcoming work subject to change.

In some upcoming work for zipkin headers, gorouter will be assigning its own span-id, so the keeping current W3C implementation was for consistency purposes and to keep from potentially breaking projects that may be relying on this functionality. W3C headers unfortunately don't have the pair of span-id+parent-span-id in addition to the trace-id, which is why I suggest that the default functionality be preserved, but be configurable so that we can account for both use cases.

ebroberson avatar Jan 30 '23 17:01 ebroberson

keep from potentially breaking projects that may be relying on this functionality

Again I challenge this @ebroberson.. What user is relying on the fact that gorouter is breaking the connection of spans? Can you actually point to any use case where this would be desired behaviour?

W3C headers unfortunately don't have the pair of span-id+parent-span-id in addition to the trace-id, which is why I suggest that the default functionality be preserved,

I'm not sure what your point is here... W3C does have the parent span ID in the traceparent. It wouldn't make sense to include the "current span-id", as that is implicitly the parent span ID.

I'm not intentionally being combative on this issue, but I cannot follow the logic that this would ever be desired behavior or adherent to the specification and this is not the same behaviour as the gorouters own B3 implementation. In the end though I suppose if it's an option that's fine, as long as it is added 😄

braunsonm avatar Jan 30 '23 17:01 braunsonm

I'm not saying that it's desired behavior, just existing behavior that must be taken into account.

My point there was that zipkin headers do have a span-id and parent-span-id, which allows gorouter to set those headers and preserve the steps in the trace.

Upon further reading of the spec, it looks like the tracestate header is used like the span-id of the B3 header, and traceparent serves the same purpose of the parent-span-id. Does that match your understanding? If that is how the W3C headers are used, I'd be fine with changing the behavior to reflect that like you've suggested. @ameowlia do you have any further insight into this?

ebroberson avatar Jan 30 '23 17:01 ebroberson

We're working on being able to export traces from gorouter using the otel-collector release. cloudfoundry/gorouter#407 is the first piece of this work.

@braunsonm If you're still interested we'd love it if you were able to deploy this after we've got the otel-collector changes done and see if the traces can show up in your tracing system without gaps

mkocher avatar Apr 04 '24 23:04 mkocher

Would definitely be interested in trying it when it's released @mkocher

Do you have a roadmap? In order to use it on our side we would want a way to provide basic auth and a tenancy header for Tempo.

braunsonm avatar Apr 05 '24 00:04 braunsonm

Hi @braunsonm,

Is this issue still oustanding with the merge of https://github.com/cloudfoundry/gorouter/commit/e7dfb8a676ede0a2a4bd0b6a1459abc0100f7be8 ?

MarcPaquette avatar Oct 11 '24 20:10 MarcPaquette

With recent changes to the otel-collector release you can now provide just about any configuration to the otel collector for trace exporters, so setting headers and basic auth should be possible. Please let us know how it goes. The best place for feedback is the #logging-and-metrics cf slack channel.

mkocher avatar Oct 11 '24 21:10 mkocher