dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

[BUG] agent tries to connect over http when unix socket endpoint is enabled

Open pdeva opened this issue 1 year ago • 4 comments

Version of dd-trace-go

1.62.0 Describe what happened:

in kubernetes, constantly seeing these errors for our golang services when tracing is enabled: Screenshot 2024-04-12 at 7 33 33 PM

Describe what you expected: while the apm functionality is still working for these services, this error shouldnt be present. it seems while its still sending traces over unix socket, the tracer is still trying to talk to the datadog agent over http port for some reason.

Steps to reproduce the issue:

this is how we init our tracer:

import(
	"go.opentelemetry.io/otel"
	ddotel "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/opentelemetry"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
	"gopkg.in/DataDog/dd-trace-go.v1/profiler"
)

func initTracer(cfg config.ServiceConfig) *ddotel.TracerProvider {
	if !cfg.IsLocalProfile() && !cfg.TracingDisabled {
		traceProvider := ddotel.NewTracerProvider(tracer.WithRuntimeMetrics())
		otel.SetTracerProvider(traceProvider)

		if config.ShouldProfile(cfg) {
			err := profiler.Start(
				profiler.WithService(cfg.ServiceName),
				profiler.WithVersion(cfg.ServiceVersion),
			)
			if err != nil {
				log.Fatal().Err(err)
			}
		}
		datastreams.Start()

		return traceProvider
	}

	return nil
}

here is the relevant config of the k8s deployment of each of the services:



      volumeMounts:
        - mountPath: /var/run/datadog
          name: apmsocketpath

      volumes:
      - hostPath:
          path: /var/run/datadog/
        name: apmsocketpath


      - env:
        - name: DD_TRACE_AGENT_URL
          value: unix:///var/run/datadog/apm.socket
        - name: DD_TRACE_SAMPLE_RATE
          value: "1.0"
        - name: DD_SERVICE
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['tags.datadoghq.com/service']
        - name: DD_VERSION
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['tags.datadoghq.com/version']

this is the relevant config of the datadog helm chart:

Screenshot 2024-04-12 at 7 37 58 PM

Additional environment details (Version of Go, Operating System, etc.): EKS 1.29 Go 1.22.1

pdeva avatar Apr 13 '24 02:04 pdeva

Thanks for reporting this @pdeva. We'll take a look in the next two weeks.

darccio avatar Apr 18 '24 08:04 darccio

@pdeva I just wrote a unit test to check if I was able to reproduce it but the communication occurs through the UNIX socket, as expected:

	t.Run("unix socket", func(t *testing.T) {
		if runtime.GOOS == "windows" {
			t.Skip("Unix domain sockets are non-functional on windows.")
		}
		srv := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
			w.Write([]byte(`{"endpoints":["/v0.6/stats"],"client_drop_p0s":true,"statsd_port":9999}`))
		}))
		udsPath := "/tmp/com.datadoghq.dd-trace-go.test.sock"
		l, err := net.Listen("unix", udsPath)
		if err != nil {
			t.Fatal(err)
		}
		defer l.Close()

		srv.Listener = l
		srv.Start()
		defer srv.Close()

		t.Setenv("DD_TRACE_AGENT_URL", "unix://"+udsPath)
		cfg := newConfig()
		assert.Equal(t, "UDS__tmp_com.datadoghq.dd-trace-go.test.sock", cfg.agentURL.Host)

		assert.True(t, cfg.agent.DropP0s)
		assert.True(t, cfg.agent.Stats)
		assert.Equal(t, 9999, cfg.agent.StatsdPort)
	})

The most possible scenario is that some service is not seeing the environment variable, thus defaulting to the HTTP connection. WDYT?

darccio avatar Apr 24 '24 14:04 darccio

i cannot tell you what the bug is, its not our agent. i can only tell you what we are observing. and we are seeing this issue for every single Golang service using datadog agent.

pdeva avatar Apr 24 '24 17:04 pdeva

@pdeva I didn't ask you that. Sorry for that. What you are observing doesn't match with my first insights. I'll try to reproduce it with a realistic setup.

darccio avatar Apr 25 '24 07:04 darccio

Reproduced. I'm going to investigate the root cause.

darccio avatar Jul 09 '24 10:07 darccio

Ok, found it. It comes from github.com/DataDog/data-streams-go/datastreams. That's why I didn't found it initially.

darccio avatar Jul 09 '24 13:07 darccio

@pdeva TL;DR: Your setup is using the deprecated library github.com/DataDog/data-streams-go.

dd-trace-go can issue HTTP requests over TCP or Unix Domain Sockets. Independently of what underlying socket is configured in the HTTP client, it's possible to see HTTP requests in your environment.

That being said, in your case there is an issue in the deprecated data-streams-go, as it doesn't respect the UDS host needed to issue the requests through the default UDS path /var/run/datadog/apm.socket.

The fix in your case is to use the new integration datastreams as suggested in the deprecated library's repository: Setup Data Streams Monitoring for Go

darccio avatar Jul 09 '24 15:07 darccio