opentelemetry-go-contrib
                                
                                 opentelemetry-go-contrib copied to clipboard
                                
                                    opentelemetry-go-contrib copied to clipboard
                            
                            
                            
                        otelhttptrace: http.receive spans not finished
Description
Spans with the resource name http.receive are being created, but never finished in the lifetime of the program.
Environment
- OS: iOS
- Architecture: x85
- Go Version: go1.21.6 darwin/arm64
- otelhttptraceversion: v0.47.0
Steps To Reproduce
I am using a Datadog tracer for testing, with the following client code, using the example here:
package main
import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptrace"
	ddotel "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/opentelemetry"
	ddtracer "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
	"go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
)
func main() {
	provider := ddotel.NewTracerProvider(ddtracer.WithDebugMode(true))
	defer provider.Shutdown()
	otel.SetTracerProvider(provider)
	client := http.Client{
		Transport: otelhttp.NewTransport(
			http.DefaultTransport,
			otelhttp.WithClientTrace(func(ctx context.Context) *httptrace.ClientTrace {
				return otelhttptrace.NewClientTrace(ctx)
			}),
		),
	}
	resp, err := client.Get("http://example.com")
	if err != nil {
		fmt.Println(error)
		os.Exit(0)
	}
	defer resp.Body.Close()
}
When I run this code and check the debug logs, I can see that 7 spans are getting started, but only 6 spans are getting finished. The one not getting finished has the following debug line:
2024/01/30 16:20:32 Datadog Tracer v1.61.0-dev DEBUG: Started Span: dd.trace_id="6805150280004253875" dd.span_id="1206165090554086340" dd.parent_id="6805150280004253875", Operation: http.receive, Resource: http.receive, Tags: map[language:go runtime-id:aad56e8c-1c67-48f0-82e2-e6d66ec77559 span.kind:client], map[_sampling_priority_v1:1 process_id:56444]
This is a span named http.receive, which appears to be getting created here: https://github.com/open-telemetry/opentelemetry-go-contrib/blob/84370754c03498903d8618a210ab3dd5ae6e2059/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L297
I can see in the code that there is a function present for finishing this span, but it doesn't appear to be getting called. I don't see any unit tests that use putIdleConn, whereas the other similar methods have them, so I wonder if there could perhaps be a bug where putIdleConn is not being executed as expected.
Expected behavior
I expect all spans that are created to be finished before the program exits. i.e. there should be a line that looks something like this as well:
2024/01/30 16:20:32 Datadog Tracer v1.61.0-dev DEBUG: Finished Span: dd.trace_id="6805150280004253875" dd.span_id="1206165090554086340" dd.parent_id="6805150280004253875", Operation: http.receive, Resource: http.receive, Tags: map[language:go runtime-id:aad56e8c-1c67-48f0-82e2-e6d66ec77559 span.kind:client], map[_sampling_priority_v1:1 process_id:56444]
Thanks for helping with the bug report. Do you have the same issue if you use the default SDK with a STDOUT exporter? E.g.
https://github.com/open-telemetry/opentelemetry-go/blob/8d3ae4c5ede1a4c6ceb769f43cc759fdbf3d28f8/exporters/stdout/stdouttrace/example_test.go#L65-L78
Just tested with that example, using this code:
	exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
	if err != nil {
		fmt.Printf("creating stdout exporter: %v\n", err)
		os.Exit(0)
	}
	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(Resource()),
	)
	ctx := context.Background()
	defer tracerProvider.Shutdown(ctx)
	otel.SetTracerProvider(tracerProvider)
	client := http.Client{
		Transport: otelhttp.NewTransport(
			http.DefaultTransport,
			otelhttp.WithClientTrace(func(ctx context.Context) *httptrace.ClientTrace {
				return otelhttptrace.NewClientTrace(ctx)
			}),
		),
	}
	resp, err := client.Get("http://example.com")
	if err != nil {
		fmt.Printf("GOT HERE! %v\n", err)
		os.Exit(0)
	}
	defer resp.Body.Close()
	fmt.Println(resp.Status)
I see 6 spans logged, and a span called http.receive is not among them. I'm assuming that means it was never finished, and thus not exported? I'm not totally sure how that exporter works with STDOUT though.
I'm able to reproduce with the example provided.
This looks related to the trigger for the http.receive span being ClientTrace.GotFirstResponseByte. That field's function is not called without the body being read.
package main
import (
	"context"
	"fmt"
+       "io"
	"net/http"
	"net/http/httptrace"
	"os"
	"go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
func main() {
	exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
	if err != nil {
		fmt.Printf("creating stdout exporter: %v\n", err)
		os.Exit(0)
	}
	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
	)
	ctx := context.Background()
	defer tracerProvider.Shutdown(ctx)
	otel.SetTracerProvider(tracerProvider)
	client := http.Client{
		Transport: otelhttp.NewTransport(
			http.DefaultTransport,
			otelhttp.WithClientTrace(func(ctx context.Context) *httptrace.ClientTrace {
				return otelhttptrace.NewClientTrace(ctx)
			}),
		),
	}
	resp, err := client.Get("http://example.com")
	if err != nil {
		fmt.Printf("GOT HERE! %v\n", err)
		os.Exit(0)
	}
        defer resp.Body.Close()
+       _, _ = io.ReadAll(resp.Body)
        fmt.Println(resp.Status)
}
This example produces the http.receive span.
Gotcha, thanks @MrAlias. Do you agree that this is a bug then, and that the span shouldn't be created at all with the code example I provided?
Honestly, I'm not sure.
- It looks like this span behavior has been there for 4 years: https://github.com/open-telemetry/opentelemetry-go-contrib/blame/84370754c03498903d8618a210ab3dd5ae6e2059/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L292-L298
- Based on the structure of http.ClientTraceit doesn't seem like there is a better way to detect when a client response has started.
- I'm not sure we should be exporting a span for a response that is never read. I think for the same reason Go doesn't consider a response received until the body is ready we should do the same.
That all said, I think the span name is a bit misleading then. It implies that a client received a response, which is true in your example. This span is something that pre-dates the OTel HTTP semantic conventions and is not included in them. I wonder if it should be included at all.
@Aneurysm9 and @dmathieu are the maintainers of this instrumentation. I think they need to weigh in here.
That's fair. I guess my main concern is that there is a way for the default behavior of the OTel library integration to create a span and then never finish it, which can cause problems, such as preventing a trace to be sent while it waits for a child to finish. And in fact, the example code provided in that repo would do exactly this. So it certainly seems reasonable to fix the example code, but even better, it would be ideal to not create the span if there's no guarantee that it'll be finished.
Parent spans should never hang because a child has not ended. They are distinct independent objects, calling End on a parent in the default OTel SDK does not wait for all of its children to finish first. Where do you see this happening?
What I see happening is a span is started when its end signal is received prior to its start (which never comes) and then when the ClientTrace is no longer used the garbage collector reclaims the un-ended span (along with all the ended ones).
Parent spans should never hang because a child has not ended. They are distinct independent objects, calling End on a parent in the default OTel SDK does not wait for all of its children to finish first. Where do you see this happening?
Ah yeah you're right. It should flush them fine regardless.
But like you said, it's possible that it could have memory implications if these spans are sitting in a buffer somewhere, depending on the underlying TracerProvider. Hopefully the garbage collector will take care of it, like you said, but it isn't ideal that a span is getting created that's never finished.
it isn't ideal that a span is getting created that's never finished.
True.
I think this span in particular is an exception to the existing algorithm implemented in otelhttptrace. Other span may be ended prior to them starting, but in this case it is pretty certain that it won't be started given to start it would mean a user would have to read a closed body.
It's an edge case, but I would support a change to somehow prevent this.
See also https://github.com/open-telemetry/opentelemetry-go-contrib/issues/4046
I'm not opposed to renaming the span (or removing it). The current behavior is also, IMHO, correct, since the net/http documentation states the response body should always be closed, even if it wasn't read.
The caller must close the response body when finished with it
https://pkg.go.dev/net/http#pkg-overview
the net/http documentation states the response body should always be closed, even if it wasn't read.
In the example code I provided, the response body is closed. It's just not read. So it's a bit surprising that the span is started but not finished in that case.