heroic
heroic copied to clipboard
Fix "...Span <span name> is GC'ed without being ended." issue (caused by a BT timeout)
100's of Tracing Spans are left un-ended from every query timeout
- I am a prism goalie
- Who wants to have a stable heroic
- So that I can focus on features and not get woken up at night and have angry users
These un-ended spans represent a real runtime risk to heroic. If ~700-1000 of these are left hanging around after each timeout-d query, it's conceivable that the JVM will :
- potentially run out of memory altogether
- experience much longer GC pauses / sweep times (cos of all the hanging spans needing reaping)
- hugely inflate the size of heroic's logs, costing us $$$ and obscuring "genuine" problems
Proposed Solution
- find the correct location to
catchthe BT timeout exception (not trivial) catchit, end the span andthrowit out again
Repro Steps
- run heroic locally with GUC config and on branch feature/add-bigtable-timeout-settings-refactored
- capture a lengthy query from grafana using the chrome dev tools network tab
- alter the query to hit localhost and watch the logs, you'll see this message
List of methods concerned from logs
- ERROR io.opencensus.trace.Tracer - Span localMetricsManager.fetchSeries is GC'ed without being ended.
- ERROR io.opencensus.trace.Tracer - Span bigtable.fetchBatch is GC'ed without being ended.
FYI @adsail , moving to inbox as it's not something we'll need to tackle until more aggressive timeouts are deployed