dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

[BUG] agent tries to connect over http when unix socket endpoint is enabled

Open pdeva opened this issue 2 months ago • 4 comments

Version of dd-trace-go

1.62.0 Describe what happened:

in kubernetes, constantly seeing these errors for our golang services when tracing is enabled: Screenshot 2024-04-12 at 7 33 33 PM

Describe what you expected: while the apm functionality is still working for these services, this error shouldnt be present. it seems while its still sending traces over unix socket, the tracer is still trying to talk to the datadog agent over http port for some reason.

Steps to reproduce the issue:

this is how we init our tracer:

import(
	"go.opentelemetry.io/otel"
	ddotel "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/opentelemetry"
	"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
	"gopkg.in/DataDog/dd-trace-go.v1/profiler"
)

func initTracer(cfg config.ServiceConfig) *ddotel.TracerProvider {
	if !cfg.IsLocalProfile() && !cfg.TracingDisabled {
		traceProvider := ddotel.NewTracerProvider(tracer.WithRuntimeMetrics())
		otel.SetTracerProvider(traceProvider)

		if config.ShouldProfile(cfg) {
			err := profiler.Start(
				profiler.WithService(cfg.ServiceName),
				profiler.WithVersion(cfg.ServiceVersion),
			)
			if err != nil {
				log.Fatal().Err(err)
			}
		}
		datastreams.Start()

		return traceProvider
	}

	return nil
}

here is the relevant config of the k8s deployment of each of the services:



      volumeMounts:
        - mountPath: /var/run/datadog
          name: apmsocketpath

      volumes:
      - hostPath:
          path: /var/run/datadog/
        name: apmsocketpath


      - env:
        - name: DD_TRACE_AGENT_URL
          value: unix:///var/run/datadog/apm.socket
        - name: DD_TRACE_SAMPLE_RATE
          value: "1.0"
        - name: DD_SERVICE
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['tags.datadoghq.com/service']
        - name: DD_VERSION
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['tags.datadoghq.com/version']

this is the relevant config of the datadog helm chart:

Screenshot 2024-04-12 at 7 37 58 PM

Additional environment details (Version of Go, Operating System, etc.): EKS 1.29 Go 1.22.1

pdeva avatar Apr 13 '24 02:04 pdeva

Thanks for reporting this @pdeva. We'll take a look in the next two weeks.

darccio avatar Apr 18 '24 08:04 darccio

@pdeva I just wrote a unit test to check if I was able to reproduce it but the communication occurs through the UNIX socket, as expected:

	t.Run("unix socket", func(t *testing.T) {
		if runtime.GOOS == "windows" {
			t.Skip("Unix domain sockets are non-functional on windows.")
		}
		srv := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
			w.Write([]byte(`{"endpoints":["/v0.6/stats"],"client_drop_p0s":true,"statsd_port":9999}`))
		}))
		udsPath := "/tmp/com.datadoghq.dd-trace-go.test.sock"
		l, err := net.Listen("unix", udsPath)
		if err != nil {
			t.Fatal(err)
		}
		defer l.Close()

		srv.Listener = l
		srv.Start()
		defer srv.Close()

		t.Setenv("DD_TRACE_AGENT_URL", "unix://"+udsPath)
		cfg := newConfig()
		assert.Equal(t, "UDS__tmp_com.datadoghq.dd-trace-go.test.sock", cfg.agentURL.Host)

		assert.True(t, cfg.agent.DropP0s)
		assert.True(t, cfg.agent.Stats)
		assert.Equal(t, 9999, cfg.agent.StatsdPort)
	})

The most possible scenario is that some service is not seeing the environment variable, thus defaulting to the HTTP connection. WDYT?

darccio avatar Apr 24 '24 14:04 darccio

i cannot tell you what the bug is, its not our agent. i can only tell you what we are observing. and we are seeing this issue for every single Golang service using datadog agent.

pdeva avatar Apr 24 '24 17:04 pdeva

@pdeva I didn't ask you that. Sorry for that. What you are observing doesn't match with my first insights. I'll try to reproduce it with a realistic setup.

darccio avatar Apr 25 '24 07:04 darccio