agent
agent copied to clipboard
DataDog Tracing Question
Thanks for the DataDog tracing integration https://github.com/buildkite/agent/pull/1273! We recently start incorporating that into our buildkite cluster and found some stuff that we want:
Customizing/differentiating error
Right now the error stack trace is coming from the go agent codepath. Is there a way to override this value? Maybe sending the information from the shell that the command is currently running? Also, canceled build polluted this, and having a way to filter out those would be very useful. We have the option cancel intermediate build checked, so the error trace is harder to use at a high level for now.
Adding span from running the command
What can we do if we want to add more span from the currently running command? Right now I believe it's limited to the level of plugins/hooks level. If the command itself in has multiple steps, and we want to get the current trace id so we can add more span to the same trace, what can we do?
Sorry if this is not the correct place, please direct me to the right forum in that case 🙏🏼
Hi @dieend
Thanks for the feedback, we appreciate it!
Right now the error stack trace is coming from the go agent codepath. Is there a way to override this value? Maybe sending the information from the shell that the command is currently running? Also, canceled build polluted this, and having a way to filter out those would be very useful. We have the option cancel intermediate build checked, so the error trace is harder to use at a high level for now.
We propagate errors up the execution chain to the span, so it might be possible to capture this better similar to how we capture the exit code. Glancing through it looks non-trivial, but it might be possible to work in. This would also be where we'd add something for filtering out cancelations, e.g. a error.canceled
tag.
What can we do if we want to add more span from the currently running command? Right now I believe it's limited to the level of plugins/hooks level. If the command itself in has multiple steps, and we want to get the current trace id so we can add more span to the same trace, what can we do?
This is hard one. We'd have to find a good way of passing the SpanContext from the agent to the executing process. I'd prefer to use a familiar pattern here of such as we have for buildkite-agent meta-data
to allow the current span context to be retrieved from a buildkite-agent command. Then, if we're running a process with an OpenTracing library for it like a Rakefile, Go script, python script, node script, etc. it could instrument child spans using the SpanContext by reading it in. This does inherently limit the use-cases. For example, it would be clumsy (but probably not impossible) to do with a shellscript, which is where most of our users put their build logic. With that kind of limitation in mind, do you see yourselves using this?
We propagate errors up the execution chain to the span, so it might be possible to capture this better similar to how we capture the exit code. Glancing through it looks non-trivial, but it might be possible to work in
I understand. I hope cancel tag won't be as hard as the custom error.
buildkite-agent meta-data to allow the current span context to be retrieved from a buildkite-agent command
Yes! I think this is a good compromise!