opentelemetry-go-contrib icon indicating copy to clipboard operation
opentelemetry-go-contrib copied to clipboard

otelhttptrace: http.receive spans not finished

Open katiehockman opened this issue 1 year ago • 15 comments

Description

Spans with the resource name http.receive are being created, but never finished in the lifetime of the program.

Environment

  • OS: iOS
  • Architecture: x85
  • Go Version: go1.21.6 darwin/arm64
  • otelhttptrace version: v0.47.0

Steps To Reproduce

I am using a Datadog tracer for testing, with the following client code, using the example here:

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptrace"

	ddotel "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/opentelemetry"
	ddtracer "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
	"go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
)

func main() {
	provider := ddotel.NewTracerProvider(ddtracer.WithDebugMode(true))
	defer provider.Shutdown()
	otel.SetTracerProvider(provider)

	client := http.Client{
		Transport: otelhttp.NewTransport(
			http.DefaultTransport,
			otelhttp.WithClientTrace(func(ctx context.Context) *httptrace.ClientTrace {
				return otelhttptrace.NewClientTrace(ctx)
			}),
		),
	}
	resp, err := client.Get("http://example.com")
	if err != nil {
		fmt.Println(error)
		os.Exit(0)
	}
	defer resp.Body.Close()
}

When I run this code and check the debug logs, I can see that 7 spans are getting started, but only 6 spans are getting finished. The one not getting finished has the following debug line:

2024/01/30 16:20:32 Datadog Tracer v1.61.0-dev DEBUG: Started Span: dd.trace_id="6805150280004253875" dd.span_id="1206165090554086340" dd.parent_id="6805150280004253875", Operation: http.receive, Resource: http.receive, Tags: map[language:go runtime-id:aad56e8c-1c67-48f0-82e2-e6d66ec77559 span.kind:client], map[_sampling_priority_v1:1 process_id:56444]

This is a span named http.receive, which appears to be getting created here: https://github.com/open-telemetry/opentelemetry-go-contrib/blob/84370754c03498903d8618a210ab3dd5ae6e2059/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L297

I can see in the code that there is a function present for finishing this span, but it doesn't appear to be getting called. I don't see any unit tests that use putIdleConn, whereas the other similar methods have them, so I wonder if there could perhaps be a bug where putIdleConn is not being executed as expected.

Expected behavior

I expect all spans that are created to be finished before the program exits. i.e. there should be a line that looks something like this as well:

2024/01/30 16:20:32 Datadog Tracer v1.61.0-dev DEBUG: Finished Span: dd.trace_id="6805150280004253875" dd.span_id="1206165090554086340" dd.parent_id="6805150280004253875", Operation: http.receive, Resource: http.receive, Tags: map[language:go runtime-id:aad56e8c-1c67-48f0-82e2-e6d66ec77559 span.kind:client], map[_sampling_priority_v1:1 process_id:56444]

katiehockman avatar Jan 30 '24 21:01 katiehockman

Thanks for helping with the bug report. Do you have the same issue if you use the default SDK with a STDOUT exporter? E.g.

https://github.com/open-telemetry/opentelemetry-go/blob/8d3ae4c5ede1a4c6ceb769f43cc759fdbf3d28f8/exporters/stdout/stdouttrace/example_test.go#L65-L78

MrAlias avatar Jan 30 '24 22:01 MrAlias

Just tested with that example, using this code:

	exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
	if err != nil {
		fmt.Printf("creating stdout exporter: %v\n", err)
		os.Exit(0)
	}

	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
		sdktrace.WithResource(Resource()),
	)
	ctx := context.Background()
	defer tracerProvider.Shutdown(ctx)
	otel.SetTracerProvider(tracerProvider)

	client := http.Client{
		Transport: otelhttp.NewTransport(
			http.DefaultTransport,
			otelhttp.WithClientTrace(func(ctx context.Context) *httptrace.ClientTrace {
				return otelhttptrace.NewClientTrace(ctx)
			}),
		),
	}

	resp, err := client.Get("http://example.com")
	if err != nil {
		fmt.Printf("GOT HERE! %v\n", err)
		os.Exit(0)
	}
	defer resp.Body.Close()
	fmt.Println(resp.Status)

I see 6 spans logged, and a span called http.receive is not among them. I'm assuming that means it was never finished, and thus not exported? I'm not totally sure how that exporter works with STDOUT though.

katiehockman avatar Jan 31 '24 15:01 katiehockman

I'm able to reproduce with the example provided.

This looks related to the trigger for the http.receive span being ClientTrace.GotFirstResponseByte. That field's function is not called without the body being read.

package main

import (
	"context"
	"fmt"
+       "io"
	"net/http"
	"net/http/httptrace"
	"os"

	"go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace"
	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
	sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

func main() {
	exporter, err := stdouttrace.New(stdouttrace.WithPrettyPrint())
	if err != nil {
		fmt.Printf("creating stdout exporter: %v\n", err)
		os.Exit(0)
	}

	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithBatcher(exporter),
	)
	ctx := context.Background()
	defer tracerProvider.Shutdown(ctx)
	otel.SetTracerProvider(tracerProvider)

	client := http.Client{
		Transport: otelhttp.NewTransport(
			http.DefaultTransport,
			otelhttp.WithClientTrace(func(ctx context.Context) *httptrace.ClientTrace {
				return otelhttptrace.NewClientTrace(ctx)
			}),
		),
	}

	resp, err := client.Get("http://example.com")
	if err != nil {
		fmt.Printf("GOT HERE! %v\n", err)
		os.Exit(0)
	}
        defer resp.Body.Close()
+       _, _ = io.ReadAll(resp.Body)
        fmt.Println(resp.Status)
}

This example produces the http.receive span.

MrAlias avatar Jan 31 '24 22:01 MrAlias

Gotcha, thanks @MrAlias. Do you agree that this is a bug then, and that the span shouldn't be created at all with the code example I provided?

katiehockman avatar Feb 01 '24 16:02 katiehockman

Honestly, I'm not sure.

  • It looks like this span behavior has been there for 4 years: https://github.com/open-telemetry/opentelemetry-go-contrib/blame/84370754c03498903d8618a210ab3dd5ae6e2059/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L292-L298
  • Based on the structure of http.ClientTrace it doesn't seem like there is a better way to detect when a client response has started.
  • I'm not sure we should be exporting a span for a response that is never read. I think for the same reason Go doesn't consider a response received until the body is ready we should do the same.

That all said, I think the span name is a bit misleading then. It implies that a client received a response, which is true in your example. This span is something that pre-dates the OTel HTTP semantic conventions and is not included in them. I wonder if it should be included at all.

@Aneurysm9 and @dmathieu are the maintainers of this instrumentation. I think they need to weigh in here.

MrAlias avatar Feb 01 '24 19:02 MrAlias

That's fair. I guess my main concern is that there is a way for the default behavior of the OTel library integration to create a span and then never finish it, which can cause problems, such as preventing a trace to be sent while it waits for a child to finish. And in fact, the example code provided in that repo would do exactly this. So it certainly seems reasonable to fix the example code, but even better, it would be ideal to not create the span if there's no guarantee that it'll be finished.

katiehockman avatar Feb 01 '24 19:02 katiehockman

Parent spans should never hang because a child has not ended. They are distinct independent objects, calling End on a parent in the default OTel SDK does not wait for all of its children to finish first. Where do you see this happening?

What I see happening is a span is started when its end signal is received prior to its start (which never comes) and then when the ClientTrace is no longer used the garbage collector reclaims the un-ended span (along with all the ended ones).

MrAlias avatar Feb 01 '24 20:02 MrAlias

Parent spans should never hang because a child has not ended. They are distinct independent objects, calling End on a parent in the default OTel SDK does not wait for all of its children to finish first. Where do you see this happening?

Ah yeah you're right. It should flush them fine regardless.

But like you said, it's possible that it could have memory implications if these spans are sitting in a buffer somewhere, depending on the underlying TracerProvider. Hopefully the garbage collector will take care of it, like you said, but it isn't ideal that a span is getting created that's never finished.

katiehockman avatar Feb 01 '24 20:02 katiehockman

it isn't ideal that a span is getting created that's never finished.

True.

I think this span in particular is an exception to the existing algorithm implemented in otelhttptrace. Other span may be ended prior to them starting, but in this case it is pretty certain that it won't be started given to start it would mean a user would have to read a closed body.

It's an edge case, but I would support a change to somehow prevent this.

MrAlias avatar Feb 01 '24 20:02 MrAlias

See also https://github.com/open-telemetry/opentelemetry-go-contrib/issues/4046

I'm not opposed to renaming the span (or removing it). The current behavior is also, IMHO, correct, since the net/http documentation states the response body should always be closed, even if it wasn't read.

The caller must close the response body when finished with it

https://pkg.go.dev/net/http#pkg-overview

dmathieu avatar Feb 05 '24 09:02 dmathieu

the net/http documentation states the response body should always be closed, even if it wasn't read.

In the example code I provided, the response body is closed. It's just not read. So it's a bit surprising that the span is started but not finished in that case.

katiehockman avatar Feb 05 '24 16:02 katiehockman