dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

tracer: add support for partial flushing

Open yaadata opened this issue 3 years ago • 2 comments

Description

Adding support for the following environment variables

  • DD_TRACER_PARTIAL_FLUSH_ENABLED
  • DD_TRACER_PARTIAL_FLUSH_MIN_SPANS

These variables are documented and available for ruby, python, and java dd trace libraries.

Why

Currently working through a project where there is a long running process (can take 1hr+ for the process to finish). The process has a single root span that that covers the entire process but crashes with the following error with the current dd-trace-go library:

Datadog Tracer v1.34.0 ERROR: trace buffer full (100000), dropping trace (occurred: 10 Aug 22 17:31 UTC)

By adding support for partial flushes, this would help resolve this issue to my understanding

If it helps, I can submit a PR, just need guidance on how to navigate the codebase to implement this change

yaadata avatar Aug 15 '22 16:08 yaadata

Any updates on this issue

yaadata avatar Aug 26 '22 21:08 yaadata

Hi @Ydot19 - thanks for bringing this up. For our own understanding, would you be able to explain your use case here? You mentioned that you are generating 100K spans over the course of an hour+ for a single trace. How do you expect to use these spans? Do you need the trace spans, or just the metrics?

katiehockman avatar Sep 02 '22 16:09 katiehockman

We're going to close this due to inactivity, but please file a new issue or reach out to your customer rep / support if this is still an issue for you.

katiehockman avatar Nov 15 '22 15:11 katiehockman

We have run into this issue on multiple occasions or at least the side effect of it. Some of our service are using websockets and developers naturally pass the context of the initial http request into the socket, which essentially causes that the service accumulates spans for the duration of the socket and eventually crashes with an OOM.

So our use-case isn't really that we need long traces, but that spans should ideally be capped at a configurable limit to prevent such mistakes by default.

johanneswuerbach avatar Nov 15 '22 19:11 johanneswuerbach

Thanks @johanneswuerbach. Will re-open so we can keep discussing this.

katiehockman avatar Nov 15 '22 21:11 katiehockman

@katiehockman Facing similar memory bloat and ultimately OOM while tracing bidirectional gRPC streams that produce long-running traces. pprof showed setMeta and setMeric holding most of the memory. Once the stream is closed, memory utilization comes back to normal.

Abhishekvrshny avatar Nov 20 '22 15:11 Abhishekvrshny

Thanks for the details and analysis @Abhishekvrshny. We'll get back to you about this shortly.

katiehockman avatar Nov 21 '22 14:11 katiehockman

Any update on this? Would would love to use tracing to troubleshoot issues on websockets, but those are potentially long running and might accumulate a lot of spans.

johanneswuerbach avatar May 31 '23 20:05 johanneswuerbach

Hi @johanneswuerbach this is something that we're actively working on. I will note though that partial flushing will not help for long running spans directly. Long running spans (>1hr) may not display correctly in the user interface and may be difficult to navigate. Partial flushing is more specifically good for relieving memory pressure in situations where you have many finished spans under an unfinished span. If you have a use case for long running spans definitely feel free to open a ticket with Datadog Support so we can best prioritize that work!

ajgajg1134 avatar May 31 '23 20:05 ajgajg1134

Hey @ajgajg1134 in our case the websocket itself (long-span) is also not the interesting bit, but more the short spans that happen during its lifecycle.

johanneswuerbach avatar May 31 '23 20:05 johanneswuerbach

Beta support for partial flushing has now been released in dd-trace-go! Please try it out and give us feedback, and let support know if you run into any issues.

katiehockman avatar Aug 10 '23 19:08 katiehockman

Thank you @katiehockman

yaadata avatar Aug 10 '23 20:08 yaadata