agent icon indicating copy to clipboard operation
agent copied to clipboard

DataDog Tracing Question

Open dieend opened this issue 3 years ago • 2 comments

Thanks for the DataDog tracing integration https://github.com/buildkite/agent/pull/1273! We recently start incorporating that into our buildkite cluster and found some stuff that we want:

Customizing/differentiating error

Right now the error stack trace is coming from the go agent codepath. Is there a way to override this value? Maybe sending the information from the shell that the command is currently running? Also, canceled build polluted this, and having a way to filter out those would be very useful. We have the option cancel intermediate build checked, so the error trace is harder to use at a high level for now.

Adding span from running the command

What can we do if we want to add more span from the currently running command? Right now I believe it's limited to the level of plugins/hooks level. If the command itself in has multiple steps, and we want to get the current trace id so we can add more span to the same trace, what can we do?

Sorry if this is not the correct place, please direct me to the right forum in that case 🙏🏼

dieend avatar Apr 12 '21 22:04 dieend

Hi @dieend

Thanks for the feedback, we appreciate it!

Right now the error stack trace is coming from the go agent codepath. Is there a way to override this value? Maybe sending the information from the shell that the command is currently running? Also, canceled build polluted this, and having a way to filter out those would be very useful. We have the option cancel intermediate build checked, so the error trace is harder to use at a high level for now.

We propagate errors up the execution chain to the span, so it might be possible to capture this better similar to how we capture the exit code. Glancing through it looks non-trivial, but it might be possible to work in. This would also be where we'd add something for filtering out cancelations, e.g. a error.canceled tag.

What can we do if we want to add more span from the currently running command? Right now I believe it's limited to the level of plugins/hooks level. If the command itself in has multiple steps, and we want to get the current trace id so we can add more span to the same trace, what can we do?

This is hard one. We'd have to find a good way of passing the SpanContext from the agent to the executing process. I'd prefer to use a familiar pattern here of such as we have for buildkite-agent meta-data to allow the current span context to be retrieved from a buildkite-agent command. Then, if we're running a process with an OpenTracing library for it like a Rakefile, Go script, python script, node script, etc. it could instrument child spans using the SpanContext by reading it in. This does inherently limit the use-cases. For example, it would be clumsy (but probably not impossible) to do with a shellscript, which is where most of our users put their build logic. With that kind of limitation in mind, do you see yourselves using this?

chloeruka avatar Apr 13 '21 03:04 chloeruka

We propagate errors up the execution chain to the span, so it might be possible to capture this better similar to how we capture the exit code. Glancing through it looks non-trivial, but it might be possible to work in

I understand. I hope cancel tag won't be as hard as the custom error.

buildkite-agent meta-data to allow the current span context to be retrieved from a buildkite-agent command

Yes! I think this is a good compromise!

dieend avatar Apr 13 '21 03:04 dieend