gwr icon indicating copy to clipboard operation
gwr copied to clipboard

Output from high throughput data sources will stall output

Open jra3 opened this issue 8 years ago • 1 comments

Use case: 10s-100s of thousands of items per second

Repro Steps: log a LOT of items, from a lot of goroutines curl -X WATCH localhost:4040/tap/

Symptom: The first several hundred items come through, then all output abruptly stops

Findings so far:

  • The 'timeout' branch of HandleItems is hit due to high load
  • the datasource is marked active = false
  • processItemChan breaks out of its loop

What should happen: We should silently? drop items, but keep trying to pump them out.

https://github.com/uber-go/gwr/blob/dev/internal/marshaled/source.go#L411

jra3 avatar Jul 21 '16 19:07 jra3

The original design intent was to default to dropping items; silently at first / by default, but with an optional affordance for watchers that want to know when/if/how-many drops are happening. E.g. the resp protocol could very easily pass along such side channel data with little chance of it being confused with the actual watched items.

Currently it seems that the following happens:

  1. marshaled.DataSource.HandleItem deactivates on first timeout
  2. the shutdown phase in the tail of marshaled.DataSource.processItem isn't successfully closing all active watchers

Part one is just our current naive, perhaps overly aggressive, design choice; it could be that something more like a circuit breakers "only deactivate if more than X% get dropped within T time" would be better.

Part two is a flat out bug: something's causing that http connection to zombie on.

jcorbin avatar Jul 21 '16 19:07 jcorbin