dd-trace-go
dd-trace-go copied to clipboard
EXPERIMENTAL: Long Running Spans support
Things this should support:
- View currently running long-running traces
- View complete long-running traces after they've completed
This is just an initial POC, but testing locally shows some initial success. I haven't done a lot of performance testing, although I did run some benchmarks with this implementation and with a sync.Map
and due to the write-heavy nature of this found the sync.Map to be far worse in performance (~100% slower), albeit the benchmarks are pretty synthetic in nature. I do anticipate that the locking of the whole map while the work
loop runs could cause spikes in latency for span creation/finishing, so any suggestions on how to handle that are welcome.
UPDATE: Moving this out of Draft so we can get some new reviews and also possibly merge it as an experimental feature to make testing / experimentation internally a bit easier.
Not sure if this is related but the title makes me think it could solve the situation of https://github.com/DataDog/dd-trace-go/pull/912 where we didn't figure something out for the long-running http websocket spans.
I do anticipate that the locking of the whole map while the work loop runs could cause spikes in latency for span creation/finishing, so any suggestions on how to handle that are welcome.
Brendan Gregg's Systems Performance book has a small section on this (5.2.5) that suggests that a map of locks could work well. The number of hash table buckets should be >= the number of CPU cores, and the hashing algorithm might just use the lower order bits of the *span
pointer (i.e. no actual hashing needed). So if the contention on a central mutex becomes problematic here, I think we can significantly reduce it if needed.
Warning: I haven't done such an implementation myself before, and there are subtle footguns (e.g. putting the lock data structures too close to each other in memory can cause false sharing problems with cache lines). So ideally we probably want a battle tested implementation of this. (FWIW: I also haven't looked at the sync.Map implementation, maybe it's already doing exactly what I'm describing, in which case you can completely ignore it : p)
Closing this until there's support from all the tracers for long running spans