agent
agent copied to clipboard
Support Datadog unix sockets for reporting traces
Supporting resources
- https://buildkite.com/blog/observability-in-the-agent-via-datadog
- https://docs.datadoghq.com/continuous_integration/setup_pipelines/buildkite/
- https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=host
- #1273
Is your feature request related to a problem? Please describe.
Traces aren’t sent to Datadog when a unix socket is used, because Buildkite only support sending traces via HTTP.
So, even though my Buildkite Agent is configured properly to send traces via the volume mounted socket, they will not send, because the Buildkite implementation expects to use HTTP requests, defaulting to http://localhost:8126/v0.4/traces
It appears this is a bug!
Describe the solution you'd like
If one of these environment variables are set:
- name: DD_TRACE_AGENT_URL
value: "unix:///var/run/datadog/apm.socket"
- name: DD_APM_RECEIVER_SOCKET
value: "unix:///var/run/datadog/apm.socket"
Then I expect the Buildkite Agent to use that address to send traces rather than the default localhost:8126
address.
Describe alternatives you've considered
Hacking together the address by setting "HOST" to unix
and "PORT" to ///var/run/datadog/apm.socket
(Doesn't work because of a failure to resolve DNS)
Additional context
With these environment variables:
- name: BUILDKITE_TRACING_BACKEND
value: "datadog"
- name: DD_ENV
value: "buildkite"
- name: DD_LOGS_INJECTION
value: "true"
- name: DD_APM_ENABLED
value: "true"
- name: DD_TRACE_ENABLED
value: "true"
- name: DD_TRACE_AGENT_URL
value: "unix:///var/run/datadog/apm.socket"
- name: DD_APM_RECEIVER_SOCKET
value: "unix:///var/run/datadog/apm.socket"
- name: DD_PROFILING_ENABLED
value: "true"
- name: DD_VERSION
value: "1.2.2"
I receive this error at the end of the Buildkite job:
2022/04/19 17:45:28 Datadog Tracer v1.28.0 ERROR: lost 1 traces: Post \"http://localhost:8126/v0.4/traces\": dial tcp 127.0.0.1:8126: connect: connection refused (occurred: 19 Apr 22 17:45 UTC)\r
","size":80171,"header_times":[1650390251265893076,1650390251768560569,1650390251771733083,1650390253199885239,1650390253201794984,1650390253622077377,1650390307627264211,1650390318180815416,1650390328697174000]
this is a really good callout! we're going to be doing some more work around tracing and observability soon, so i'll keep this in mind for when we do.
kia ora @agates4! I've had a bit more of a chance to engage my brain on this this afternoon.
As far as i can tell, there's no way to tell dd-trace-go
to use the UDS socket, other than either:
- just having a socket file in the right magic location. It looks like if there's a socket there, it'll use it, and otherwise it'll use normal UDP
- Manually setting
WithUDS
on the tracer instance
With this in mind, we could theoretically do something like
// in bootstrap/tracing.go
func (b *Bootstrap) startTracingDatadog(ctx context.Context) (tracetools.Span, context.Context, stopper) {
opts := []tracer.StartOption{
tracer.WithServiceName("buildkite_agent"),
tracer.WithSampler(tracer.NewAllSampler()),
tracer.WithAnalytics(true),
}
if sock := os.Getenv("DD_APM_RECEIVER_SOCKET"); sock != "" {
opts = append(opts, tracer.WithUDS(sock))
}
// ... rest of the tracing setup
}
...butttt i'm not super keen to do something like this for a couple of reasons. The main one is that the socket location is a concern somewhat external to the agent itself; we don't really want to be explicitly handling an environment variable that's meant for the datadog agent.
my recommendation at this stage would be to open a PR against dd-trace-go
that allows the UDS socket to be specified by environment variable - it's a bit weird that on that repo, some stuff can be configured with envars and some can't, and it's basically a crapshoot as to what's what.
the other option would be to create a socket at the magic location above, but (bit of a digression here) it seems like both the datadog agent and the tracing library expect the socket to exist prior to their initialisation, which seems like a weirdly hands-off approach. I would've expected the datadog agent to create the socket and have the libraries connect to it? It's all a bit odd.
I realise "go implement it yourself" isn't exactly am awesome outcome for this issue; i'm happy to discuss further and see if we can get to some sort of a happy middle ground :)
Hey Moskyb, thanks for the detailed explanation!
Even if the hosts are not set, and the file exists in the magic location, the datadog client will still use localhost:8126
by default over http without UDS.
To fix this, I implemented this change and submitted a PR to the Go client https://github.com/DataDog/dd-trace-go/pull/1298
This should offload any changes from Buildkite and allow the user set environment variables to handle the rest!
Awesome! When that gets released, we'll update dd-trace-go to use that feature :)
I'll leave this issue open for tracking until then