OpenTelemetry: (Go) plugins do not get a traceparent header. Regression in 3.2
Is there an existing issue for this?
- [X] I have searched the existing issues
Kong version ($ kong version)
3.2.2
Current Behavior
We have some Kong plugins written in Go using the go-pdk. They have OpenTelemetry tracing added. This is done directly with otel libraries as the go-pdk does not support traces yet https://github.com/Kong/go-pdk/issues/142
Under Kong 3.2 the traces are not linked because Kong does not set the traceparent header
Expected Behavior
Go plugins are called with traceparent header, like under Kong 3.1.1, so that traces are linked together.
Steps To Reproduce
- Enable opentelemetry plugin in Kong
- Have a go plugin with priority > 14 that also publishes opentelemetry traces
- Make a call so traces are generated
- Check observability platform
- Note that the trace from the go plugin is unrelated to the trace from kong
Here's a gist with a minimalish example: https://gist.github.com/dpiddock/137b2b63bd53a4a7f87878b8a1f39d7d
-
docker-compose up -d --build -
curl localhost:8001/echo -
curl localhost:8002/echo - Open Jaeger on http://localhost:16686
- Note that kong31's trace has 6 spans and kong32 has one trace with 5 spans and another trace with 1 span
sample log lines from plugin:
137b2b63bd53a4a7f87878b8a1f39d7d-kong31-1 | 2023/04/18 15:45:11 [info] 1133#0: *598 [kong-trace-plugin:1137] 2023/04/18 15:45:11 Traceparent: 00-3a5609fac5dfe1f34ab6584b90ebf696-8b92ce0bd7acce91-01, context: ngx.timer
137b2b63bd53a4a7f87878b8a1f39d7d-kong32-1 | 2023/04/18 15:45:14 [info] 1201#0: *151 [kong-trace-plugin:1205] 2023/04/18 15:45:14 Traceparent: , context: ngx.timer
Anything else?
I've tracked the regression down to this commit: https://github.com/Kong/kong/commit/64fabec2118a435088f0f6ff4640836a65c1c111
Moreso than the traceparent header, I think what non-Lua plugins really want/need is a way to get the current span and trace id for the request and phase that they are in - either through the plugin-specific context, the tracing PDK api, or some other mechanism. Even if the header is passed into the plugin from the initial request, the span hierarchy won't be correct since a new span may have been created for the plugin's execution in a specific phase.
@dpiddock Thank you for reporting this issue and providing clear and comprehensive steps to reproduce it. Based on the information you've provided, here's what I believe is happening:
In Kong 3.1
- Kong receives a request (
curl localhost:8001/echo) without a traceparent header - The OpenTelemetry plugin executes first (OTel used to read headers in the rewrite phase in 3.1), it finds no tracing headers, so it generates a new
trace_idand injectstraceparentin the current request. - Your custom plugin executes next (Access phase) and finds
traceparentin the request, it uses it to append spans to the same trace.
Now, as you correctly pointed out, this commit changed things a bit, OTel now reads headers in the Access phase, which means the following happens:
In Kong 3.2
- Kong receives a request (
curl localhost:8001/echo) without a traceparent header - Your custom plugin executes next (Access phase, priority
802) and finds no tracing headers in the request, a new trace id is generated by the opentelemetry API, which is then not set to thetraceparentheader of the current request - The OpenTelemetry plugin executes (Access phase, priority
14), it (again) finds no tracing headers, so it generates a newtrace_id(and reports spans to a different trace, as you can observe from Jaeger)
The behavior above is expected, due to the change in the order of execution of the plugins. To obtain in Kong 3.2 the same behavior as 3.1 you could:
- run your plugin with a priority < 14 (e.g.
13) assuming this is possible based on your specific needs - alternatively, leave the priorities unchanged (your plugin runs before OTel in the Access phase) and have your plugin generate the trace ID and communicate it to Opentelemetry by injecting
traceparentin the current request using the PDK:kong.ServiceRequest.SetHeader("traceparent", traceparent). In this case it will be necessary to assign thetraceparentvariable the correct value first.
Let me know if that helps
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@samugi unfortunately we cannot have the plugins set to such a low priority.
I tried injecting a generated traceparent header however this doesn't quite work correctly as the go plugins do not have access to Kong's span id. But that's not the biggest issue. Injecting the traceparent header then causes the Kong opentelemetry plugin to think that it's not the root trace. It extracts the parent span ID from the header and sets it as its parent. So the traces from Kong and the plugins end up referencing a non-existent parent.
@dpiddock thanks for sharing additional details. I believe for your custom implementation to work as is, it would be necessary to execute the plugin after Opentelemetry (like in 3.1), as it relies on that plugin's information as discussed above. Modifying the priority or executing the code in a different handler (that runs after "Access") are the only ways to achieve this.
To recap, this is not a bug, the behavior is expected as a consequence of this change which resulted in the order of execution of the Opentelemetry plugin and your custom plugin to be swapped around.
I think @backjo 's comment may also apply. If the tracing context (trace ID, parent ID, etc) were made available in the shared context, the custom plugin would have all that information available. Would such a feature be helpful to you as well?
@samugi I've gone down the rabbit hole on this one and ended up writing a small Lua plugin that can generate the traceparent header and store it in the shared context. The other plugins then have to use the traceparent from the shared context and not the HTTP request.
I updated the gist: https://gist.github.com/dpiddock/137b2b63bd53a4a7f87878b8a1f39d7d#file-handler-lua
@dpiddock glad you found a workaround! This code will only work for w3c (traceparent) headers and not with other formats (which is fine if it solves your specific use case). Note: you should be able to use a pre-function plugin to execute the same logic with no need to implement the full Lua plugin.
Let me know if you are ok to close this, @backjo for what concerns your specific request I think it would be ideal to open a separate issue, could you please do that?
@samugi I've written a hack that, as you rightly point out, only works for our particular use case. It would be much better if Kong provided the correct headers when executing Go plugins or, as backjo commented, if the go-pdk exposed the necessary trace information. That way the plugins could fetch the trace ID and their parent ID. Currently with Kong 3.3 the Go plugins appear at the same level as internal Kong traces and overlap time stamps with their parent kong.$phase.plugin.$plugin_name trace.
if the go-pdk exposed the necessary trace information
I believe this would be the most flexible solution, @dpiddock. We could keep this issue open to track the progress of that effort, it would be great if you could update the description to reflect the request. This may be implemented as part of an upcoming rework of tracing. I will make sure to link any relevant PR here.
@samugi I've updated the description
This issue is marked as stale because it has been open for 14 days with no activity.
Dear contributor,
We are automatically closing this issue because it has not seen any activity for three weeks. We're sorry that your issue could not be resolved. If any new information comes up that could help resolving it, please feel free to reopen it.
Your contribution is greatly appreciated!
Please have a look our pledge to the community for more information.
Sincerely, Your Kong Gateway team