cloudflared
cloudflared copied to clipboard
Otel tracing
Hello,
I've noticed the addition of tracing via OpenTelemetry and being a Dynatrace customer, I have enabled the capturing of OpenTelemetry traces into Dynatrace.
The issue I am facing is I am able to see the requests made to the metrics endpoint ( /metrics ).
Am I doing something wrong? I can't see any of the proxied requests being captured or instrumented?
We do not currently support OpenTelemetry traces externally. The code you see in cloudflared to add tracing is for handling our internal tracing. You shouldn't have any issue adding your own traces through to your services via request headers.
As for requests provided from the exposed metric endpoint, does cloudflared_tunnel_concurrent_requests_per_tunnel, cloudflared_tunnel_total_requests, or promhttp_metric_handler_requests_total not meet your needs?
You shouldn't have any issue adding your own traces through to your services via request headers
Sorry could you elaborate further on this?
As for requests provided from the exposed metric endpoint, does
cloudflared_tunnel_concurrent_requests_per_tunnel,cloudflared_tunnel_total_requests, orpromhttp_metric_handler_requests_totalnot meet your needs?
These metrics are useful, however cloudflare tunnel being the entrypoint to our systems, it would be more appropriate to have the proxies traced. This helps Dynatraces purepath form better relationships between all entities.
We do not currently support OpenTelemetry traces externally
Is there a plan for this in future releases?
Sorry could you elaborate further on this?
Cloudflared for HTTP requests will forward headers to the origin service. This means you can provide any type of distributed tracing header to a request and it will make it to your origin service. For instance OpenTelemetry context propagators into HTTP headers.
To your point though, there is little insight we currently provide for the end-to-end latency of our internal services outside of metrics endpoint that provides the prometheus-based metrics. One solution is to do as I mentioned above and have tracing on your client service and then tracing on your origin service and then the latency in-between the two traces would be the total Cloudflare latency (including cloudflared and many other internal services).
Is there a plan for this in future releases?
We currently do not plan for the OpenTelemetry traces that we use for cloudflared to be exposed externally. @abelinkinbio would be a better person to track this as a request.
Thanks for flagging this one, Devin. This isn't something we're actively working on nor have on the "immediate" roadmap. That said, we do have plans to surface more granular Tunnel analytics within the Zero Trust dashboard which would provide better observability in general. As we make progress here, I'll be sure to keep this thread in the loop.
+1, this would be excellent for service-level alerting/troubleshooting. Particularly with vendored or legacy services that can't be instrumented directly. At the moment I've resorted to parsing logs which isn't a great experience (since the current in-memory traces don't appear to be dumped until shutdown)
We too (enterprise customer) would like to see better support here for OTEL. In essence, cloudflare is a span which we would love to see in our request traces. This solves a number of problems for us:
- Cloudflare log push, cloudflared, ingress-controller and our application logs can now all be EASILY correlated (end to end0
- Many clients can't / won't set a parent, the CF log push already has a rich set of headers which would make the parent span super rich in any modern APM too.
- Issues with dropped connections or crashes become easier to diagnose as they bubble up the layer as the logs are all linked.