dd-trace-go
dd-trace-go copied to clipboard
[BUG] agent tries to connect over http when unix socket endpoint is enabled
Version of dd-trace-go
1.62.0 Describe what happened:
in kubernetes, constantly seeing these errors for our golang services when tracing is enabled:
Describe what you expected: while the apm functionality is still working for these services, this error shouldnt be present. it seems while its still sending traces over unix socket, the tracer is still trying to talk to the datadog agent over http port for some reason.
Steps to reproduce the issue:
this is how we init our tracer:
import(
"go.opentelemetry.io/otel"
ddotel "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/opentelemetry"
"gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"
"gopkg.in/DataDog/dd-trace-go.v1/profiler"
)
func initTracer(cfg config.ServiceConfig) *ddotel.TracerProvider {
if !cfg.IsLocalProfile() && !cfg.TracingDisabled {
traceProvider := ddotel.NewTracerProvider(tracer.WithRuntimeMetrics())
otel.SetTracerProvider(traceProvider)
if config.ShouldProfile(cfg) {
err := profiler.Start(
profiler.WithService(cfg.ServiceName),
profiler.WithVersion(cfg.ServiceVersion),
)
if err != nil {
log.Fatal().Err(err)
}
}
datastreams.Start()
return traceProvider
}
return nil
}
here is the relevant config of the k8s deployment of each of the services:
volumeMounts:
- mountPath: /var/run/datadog
name: apmsocketpath
volumes:
- hostPath:
path: /var/run/datadog/
name: apmsocketpath
- env:
- name: DD_TRACE_AGENT_URL
value: unix:///var/run/datadog/apm.socket
- name: DD_TRACE_SAMPLE_RATE
value: "1.0"
- name: DD_SERVICE
valueFrom:
fieldRef:
fieldPath: metadata.labels['tags.datadoghq.com/service']
- name: DD_VERSION
valueFrom:
fieldRef:
fieldPath: metadata.labels['tags.datadoghq.com/version']
this is the relevant config of the datadog helm chart:
Additional environment details (Version of Go, Operating System, etc.): EKS 1.29 Go 1.22.1
Thanks for reporting this @pdeva. We'll take a look in the next two weeks.
@pdeva I just wrote a unit test to check if I was able to reproduce it but the communication occurs through the UNIX socket, as expected:
t.Run("unix socket", func(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("Unix domain sockets are non-functional on windows.")
}
srv := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.Write([]byte(`{"endpoints":["/v0.6/stats"],"client_drop_p0s":true,"statsd_port":9999}`))
}))
udsPath := "/tmp/com.datadoghq.dd-trace-go.test.sock"
l, err := net.Listen("unix", udsPath)
if err != nil {
t.Fatal(err)
}
defer l.Close()
srv.Listener = l
srv.Start()
defer srv.Close()
t.Setenv("DD_TRACE_AGENT_URL", "unix://"+udsPath)
cfg := newConfig()
assert.Equal(t, "UDS__tmp_com.datadoghq.dd-trace-go.test.sock", cfg.agentURL.Host)
assert.True(t, cfg.agent.DropP0s)
assert.True(t, cfg.agent.Stats)
assert.Equal(t, 9999, cfg.agent.StatsdPort)
})
The most possible scenario is that some service is not seeing the environment variable, thus defaulting to the HTTP connection. WDYT?
i cannot tell you what the bug is, its not our agent. i can only tell you what we are observing. and we are seeing this issue for every single Golang service using datadog agent.
@pdeva I didn't ask you that. Sorry for that. What you are observing doesn't match with my first insights. I'll try to reproduce it with a realistic setup.