[Orchestrion] Unintended Root Span Generation by net/http Component
Version of orchestrion orchestrion v1.0.4
Describe what happened: When instrumenting with Orchestrion, a RootSpan generated by the net/http component is attached above the router library. Is this span intentional?
For example, when instrumenting a Chi endpoint with Orchestrion, the generated spans appear as follows:
- Before Instrumentation(e.g., Chi)
func main() {
r := chi.NewRouter()
r.Use(Recoverer)
r.Get("/ping", ping)
log.Printf("Starting serviceB on port %v", addr)
err := http.ListenAndServe(addr, r)
if err != nil {
log.Fatalf("Server failed: %v", err)
return
}
}
- After Orchestrion Instrumentation (excerpt)
func main() {
//line <generated>:1
{
defer tracer.Stop()
}
//line /Users/xxx/server.go:51
r :=
//line <generated>:1
func() *chi.Mux {
mux :=
//line /Users/xxx/server.go:51
chi.NewRouter()
//line <generated>:1
mux.Use(__orchestrion_chitrace.Middleware())
return mux
}()
//line /Users/xxx/server.go:52
r.Use(Recoverer)
r.Get("/ping", ping)
log.Printf("Starting serviceB on port %v", addr)
err := http.ListenAndServe(addr, r)
if err != nil {
log.Fatalf("Server failed: %v", err)
return
}
}
//....
//line <generated>:1
func init() {
tracer.Start(tracer.WithOrchestrion(map[string]string{"version": "v1.0.4"}))
}
Based on the instrumentation results, it appears that the span generated by the router library's middleware should become the root span.
Since we rely on the root span's message for error tracking and other purposes, this behavior causes issues.
Describe what you expected: Expected the root span to be generated by the router library's middleware rather than the net/http component.
Steps to Reproduce the Issue:
- Use any router library (e.g., Chi) in a Go application.
- Instrument the application using Orchestrion.
This is intentional, because all the layers are instrumented, so the call graph kind of looks like this: net/http.Server -> chi.Router -> your handler.
We surely could make the net/http.Server span stand out a little more (looking less like it's a full duplicate of the router span), but I guess this would not change the fact that the router span won't be the trace root.
Since the net/http server is instrumented, it generates a span that happens to be the root span in this case. For technical reasons, net/http is instrumented library-side, so you don't see it in the generated code, and it's not (currently) possible to opt out of it 😔
Since we rely on the root span's message for error tracking and other purposes, this behavior causes issues.
Would you be able to elaborate a bit more on the use-case here... More specifically, why you can't get the message if it's not in the root span (is your query scoped to root spans only? a specific span name? etc...) -- it might help us identify the best path forward with this situation.
Hello! 👋 @sugar-cat7
As @RomainMuller says, the order is determined my the call graph, and since the net/http server call happens first, it will be the root span.
We are planing on implementing a configuration to disable certain integrations, so in this case you could disable the spans created by the net/http integration. Would this solve your issue?
@RomainMuller
Thank you. I understand now why the net/http span is being generated.
Since we rely on the root span's message for error tracking and other purposes, this behavior causes issues.
When specifying DD_SERVICE, there's an issue where router errors are not propagated to Error Tracking.
Context:
- We use the service name as a query for Error Tracking.
- The service name is specified using the
DD_SERVICEenvironment variable (UST).
https://github.com/user-attachments/assets/9bdba454-25e7-4afb-87c8-d2a448db1a33
- code
func main() {
r := chi.NewRouter()
r.Get("/ping", func(w http.ResponseWriter, r *http.Request) {
err := errors.New("something wrong")
if err != nil {
span, _ := tracer.SpanFromContext(r.Context())
span.SetTag("error.msg", err.Error())
http.Error(w, "Failed to create request", http.StatusInternalServerError)
return
}
w.Write([]byte("pong"))
})
log.Printf("Starting serviceB on port %v", addr)
err := http.ListenAndServe(addr, r)
if err != nil {
log.Fatalf("Server failed: %v", err)
return
}
}
Because of this, the error.msg field associated with the router does not propagate to the span generated by the net/http component.
Since this process happens internally, it is currently difficult to propagate error details to the net/http component.
@rarguelloF
We are planing on implementing a configuration to disable certain integrations, so in this case you could disable the spans created by the net/http integration. Would this solve your issue?
Yes, if this feature is implemented, it should resolve the issue.
Thanks for the reproduction code, that's very helpful!
And so I understand now - the core issue here is that the net/http instrumentation slaps an error.msg = 500: Internal Server Error on the trace root, which I believe "hides" the underlying error set on the chi.Router span...
@rarguelloF, on top of your work to enable integration-level-disable-switches, I have an idea for a strategy that could work here... Since we're basically getting two layers of instrumentation (one via net/http.Server and the other via chi.Router); we could have the tracer implement something along the lines of:
- Add an API to the
net/httpcontrib that allows registration of "tracing" handler types - Update
WrapHanlderto be a no-op on types registered using the above API - Have HTTP router contribs register their handler implementation using the above API (in the
func initof the contrib, callhttptrace.RegisterRoutingHandlerType[*chi.Router]())
This would allow us to systematically avoid double-tracing on know already-traced router types; and would also allow user-registration of handler types that should not be wrapped...
Just wanted to add that I am seeing this with our gin servers as well, which was confusing me too. Apparently the 1st span belongs to the net/http component and the second to gin-gonic/gin:
Both of them have their operation name defined as http.request, which makes me wonder whether that could lead to double-counting requests in DD when grouping by http.request?
We're seeing this issue, and I want to note that it also means double counting the raw request count in APM.
As many of you may have already noticed, one solution to this issue is to selectively choose integrations (likely added around v1.1.0).
You can modify orchestrion.tool.go as follows to exclude the net/http contrib. (While this could be problematic in many cases where you want to instrument net/http as well, in most cases you're using wrapped libraries, so it shouldn't be an issue.)
import (
// Ensures `orchestrion` is present in `go.mod` so that builds are repeatable.
// Do not remove.
_ "github.com/DataDog/orchestrion" // integration
// DataDog v2 tracer integrations
_ "github.com/DataDog/dd-trace-go/contrib/go-chi/chi.v5/v2"
_ "github.com/DataDog/dd-trace-go/v2/ddtrace/tracer"
)
Hi fellas,
Any update on this? It's fairly annoying :). Cheers!