dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

[Orchestrion] Unintended Root Span Generation by net/http Component

Open sugar-cat7 opened this issue 11 months ago • 9 comments

Version of orchestrion orchestrion v1.0.4

Describe what happened: When instrumenting with Orchestrion, a RootSpan generated by the net/http component is attached above the router library. Is this span intentional?

For example, when instrumenting a Chi endpoint with Orchestrion, the generated spans appear as follows:

Image
  • Before Instrumentation(e.g., Chi)
func main() {
	r := chi.NewRouter()
	r.Use(Recoverer)
	r.Get("/ping", ping)

	log.Printf("Starting serviceB on port %v", addr)
	err := http.ListenAndServe(addr, r)
	if err != nil {
		log.Fatalf("Server failed: %v", err)
		return
	}
}
  • After Orchestrion Instrumentation (excerpt)
func main() {
//line <generated>:1
	{
		defer tracer.Stop()
	}
//line /Users/xxx/server.go:51
	r :=
//line <generated>:1
		func() *chi.Mux {
			mux :=
//line /Users/xxx/server.go:51
				chi.NewRouter()
//line <generated>:1
			mux.Use(__orchestrion_chitrace.Middleware())
			return mux
		}()
//line /Users/xxx/server.go:52
	r.Use(Recoverer)
	r.Get("/ping", ping)

	log.Printf("Starting serviceB on port %v", addr)
	err := http.ListenAndServe(addr, r)
	if err != nil {
		log.Fatalf("Server failed: %v", err)
		return
	}
}

//....

//line <generated>:1
func init() {
	tracer.Start(tracer.WithOrchestrion(map[string]string{"version": "v1.0.4"}))
}

Based on the instrumentation results, it appears that the span generated by the router library's middleware should become the root span.

Since we rely on the root span's message for error tracking and other purposes, this behavior causes issues.

Describe what you expected: Expected the root span to be generated by the router library's middleware rather than the net/http component.

Steps to Reproduce the Issue:

  1. Use any router library (e.g., Chi) in a Go application.
  2. Instrument the application using Orchestrion.

sugar-cat7 avatar Feb 05 '25 12:02 sugar-cat7

This is intentional, because all the layers are instrumented, so the call graph kind of looks like this: net/http.Server -> chi.Router -> your handler.

We surely could make the net/http.Server span stand out a little more (looking less like it's a full duplicate of the router span), but I guess this would not change the fact that the router span won't be the trace root.

Since the net/http server is instrumented, it generates a span that happens to be the root span in this case. For technical reasons, net/http is instrumented library-side, so you don't see it in the generated code, and it's not (currently) possible to opt out of it 😔

RomainMuller avatar Feb 05 '25 13:02 RomainMuller

Since we rely on the root span's message for error tracking and other purposes, this behavior causes issues.

Would you be able to elaborate a bit more on the use-case here... More specifically, why you can't get the message if it's not in the root span (is your query scoped to root spans only? a specific span name? etc...) -- it might help us identify the best path forward with this situation.

RomainMuller avatar Feb 05 '25 14:02 RomainMuller

Hello! 👋 @sugar-cat7

As @RomainMuller says, the order is determined my the call graph, and since the net/http server call happens first, it will be the root span.

We are planing on implementing a configuration to disable certain integrations, so in this case you could disable the spans created by the net/http integration. Would this solve your issue?

rarguelloF avatar Feb 05 '25 14:02 rarguelloF

@RomainMuller Thank you. I understand now why the net/http span is being generated.

Since we rely on the root span's message for error tracking and other purposes, this behavior causes issues.

When specifying DD_SERVICE, there's an issue where router errors are not propagated to Error Tracking.

Context:

  • We use the service name as a query for Error Tracking.
  • The service name is specified using the DD_SERVICE environment variable (UST).

https://github.com/user-attachments/assets/9bdba454-25e7-4afb-87c8-d2a448db1a33

  • code
func main() {
	r := chi.NewRouter()
	r.Get("/ping", func(w http.ResponseWriter, r *http.Request) {
		err := errors.New("something wrong")
		if err != nil {
			span, _ := tracer.SpanFromContext(r.Context())
			span.SetTag("error.msg", err.Error())
			http.Error(w, "Failed to create request", http.StatusInternalServerError)
			return
		}
		w.Write([]byte("pong"))
	})

	log.Printf("Starting serviceB on port %v", addr)
	err := http.ListenAndServe(addr, r)
	if err != nil {
		log.Fatalf("Server failed: %v", err)
		return
	}
}

Because of this, the error.msg field associated with the router does not propagate to the span generated by the net/http component.
Since this process happens internally, it is currently difficult to propagate error details to the net/http component.

@rarguelloF

We are planing on implementing a configuration to disable certain integrations, so in this case you could disable the spans created by the net/http integration. Would this solve your issue?

Yes, if this feature is implemented, it should resolve the issue.

sugar-cat7 avatar Feb 06 '25 05:02 sugar-cat7

Thanks for the reproduction code, that's very helpful!

And so I understand now - the core issue here is that the net/http instrumentation slaps an error.msg = 500: Internal Server Error on the trace root, which I believe "hides" the underlying error set on the chi.Router span...


@rarguelloF, on top of your work to enable integration-level-disable-switches, I have an idea for a strategy that could work here... Since we're basically getting two layers of instrumentation (one via net/http.Server and the other via chi.Router); we could have the tracer implement something along the lines of:

  • Add an API to the net/http contrib that allows registration of "tracing" handler types
  • Update WrapHanlder to be a no-op on types registered using the above API
  • Have HTTP router contribs register their handler implementation using the above API (in the func init of the contrib, call httptrace.RegisterRoutingHandlerType[*chi.Router]())

This would allow us to systematically avoid double-tracing on know already-traced router types; and would also allow user-registration of handler types that should not be wrapped...

RomainMuller avatar Feb 10 '25 10:02 RomainMuller

Just wanted to add that I am seeing this with our gin servers as well, which was confusing me too. Apparently the 1st span belongs to the net/http component and the second to gin-gonic/gin:

Image

Both of them have their operation name defined as http.request, which makes me wonder whether that could lead to double-counting requests in DD when grouping by http.request?

takanuva15 avatar Apr 24 '25 14:04 takanuva15

We're seeing this issue, and I want to note that it also means double counting the raw request count in APM.

buley avatar Apr 24 '25 18:04 buley

As many of you may have already noticed, one solution to this issue is to selectively choose integrations (likely added around v1.1.0). You can modify orchestrion.tool.go as follows to exclude the net/http contrib. (While this could be problematic in many cases where you want to instrument net/http as well, in most cases you're using wrapped libraries, so it shouldn't be an issue.)

import (
    // Ensures `orchestrion` is present in `go.mod` so that builds are repeatable.
    // Do not remove.
    _ "github.com/DataDog/orchestrion" // integration

    // DataDog v2 tracer integrations
    _ "github.com/DataDog/dd-trace-go/contrib/go-chi/chi.v5/v2"
    _ "github.com/DataDog/dd-trace-go/v2/ddtrace/tracer"
)

sugar-cat7 avatar Jul 21 '25 15:07 sugar-cat7

Hi fellas,

Any update on this? It's fairly annoying :). Cheers!

ian-axelrod-sl avatar Nov 05 '25 20:11 ian-axelrod-sl