gwr Output from high throughput data sources will stall output

Output from high throughput data sources will stall output

Open jra3 opened this issue 8 years ago • 1 comments

Use case: 10s-100s of thousands of items per second

Repro Steps: log a LOT of items, from a lot of goroutines curl -X WATCH localhost:4040/tap/

Symptom: The first several hundred items come through, then all output abruptly stops

Findings so far:

The 'timeout' branch of HandleItems is hit due to high load
the datasource is marked active = false
processItemChan breaks out of its loop

What should happen: We should silently? drop items, but keep trying to pump them out.

https://github.com/uber-go/gwr/blob/dev/internal/marshaled/source.go#L411

Jul 21 '16 19:07 jra3

The original design intent was to default to dropping items; silently at first / by default, but with an optional affordance for watchers that want to know when/if/how-many drops are happening. E.g. the resp protocol could very easily pass along such side channel data with little chance of it being confused with the actual watched items.

Currently it seems that the following happens:

marshaled.DataSource.HandleItem deactivates on first timeout
the shutdown phase in the tail of marshaled.DataSource.processItem isn't successfully closing all active watchers

Part one is just our current naive, perhaps overly aggressive, design choice; it could be that something more like a circuit breakers "only deactivate if more than X% get dropped within T time" would be better.

Part two is a flat out bug: something's causing that http connection to zombie on.

Jul 21 '16 19:07 jcorbin

gwr gwr copied to clipboard

Output from high throughput data sources will stall output

gwr
gwr copied to clipboard