dd-trace-go icon indicating copy to clipboard operation
dd-trace-go copied to clipboard

EXPERIMENTAL: Long Running Spans support

Open ajgajg1134 opened this issue 2 years ago • 2 comments

Things this should support:

  1. View currently running long-running traces
  2. View complete long-running traces after they've completed

This is just an initial POC, but testing locally shows some initial success. I haven't done a lot of performance testing, although I did run some benchmarks with this implementation and with a sync.Map and due to the write-heavy nature of this found the sync.Map to be far worse in performance (~100% slower), albeit the benchmarks are pretty synthetic in nature. I do anticipate that the locking of the whole map while the work loop runs could cause spikes in latency for span creation/finishing, so any suggestions on how to handle that are welcome.

UPDATE: Moving this out of Draft so we can get some new reviews and also possibly merge it as an experimental feature to make testing / experimentation internally a bit easier.

ajgajg1134 avatar Mar 07 '22 19:03 ajgajg1134

Not sure if this is related but the title makes me think it could solve the situation of https://github.com/DataDog/dd-trace-go/pull/912 where we didn't figure something out for the long-running http websocket spans.

Julio-Guerra avatar Mar 09 '22 10:03 Julio-Guerra

I do anticipate that the locking of the whole map while the work loop runs could cause spikes in latency for span creation/finishing, so any suggestions on how to handle that are welcome.

Brendan Gregg's Systems Performance book has a small section on this (5.2.5) that suggests that a map of locks could work well. The number of hash table buckets should be >= the number of CPU cores, and the hashing algorithm might just use the lower order bits of the *span pointer (i.e. no actual hashing needed). So if the contention on a central mutex becomes problematic here, I think we can significantly reduce it if needed.

Warning: I haven't done such an implementation myself before, and there are subtle footguns (e.g. putting the lock data structures too close to each other in memory can cause false sharing problems with cache lines). So ideally we probably want a battle tested implementation of this. (FWIW: I also haven't looked at the sync.Map implementation, maybe it's already doing exactly what I'm describing, in which case you can completely ignore it : p)

felixge avatar Mar 09 '22 11:03 felixge

Closing this until there's support from all the tracers for long running spans

ajgajg1134 avatar Nov 15 '22 20:11 ajgajg1134