mimir
mimir copied to clipboard
frontend: use zero-allocation response decoder in shard active series middleware
What this PR does
At certain times we've observed in ops huge spikes of allocations during the deserialization/processing of each of the query responses when merging the results in the shard active series frontend middleware, occasionally reaching values close to ~80%
of total allocations.
While this condition does not critically compromise the service, nor does imply a customer impact, such spikes translate into high CPU consumption that can negatively impact the TCO.
This PR aims to solve this problem through the implementation of a zero-allocation decoder, specifically adapted to the use case of said middleware, that borrows certain functions from the json-iter library, currently in use.
Broadly speaking, the way this new decoder works is by copying all contents of the response data
key into an internal buffer that is subsequently reused once is received and consumed by the merger goroutine, rather than copying each of the strings associated with all label-value pairs.
Below are the comparative benchmark results before and after the change.
name old time/op new time/op delta
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-4-8 8.93µs ± 0% 7.51µs ± 2% -15.86% (p=0.016 n=4+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-16-8 15.9µs ± 1% 17.0µs ± 0% +6.66% (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-64-8 45.0µs ± 1% 47.3µs ± 5% +5.32% (p=0.032 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-128-8 77.8µs ± 0% 82.7µs ± 0% +6.39% (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-4-8 14.5µs ± 1% 13.3µs ± 3% -8.24% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-16-8 25.4µs ± 3% 25.7µs ± 1% ~ (p=0.310 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-64-8 59.9µs ± 1% 65.3µs ± 0% +8.99% (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-128-8 91.7µs ± 2% 136.8µs ±21% +49.12% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-4-8 1.90ms ± 0% 0.58ms ±12% -69.26% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-16-8 7.22ms ± 0% 2.18ms ±12% -69.82% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-64-8 28.3ms ± 1% 8.0ms ±22% -71.65% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-128-8 56.1ms ± 1% 14.5ms ± 6% -74.14% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-4-8 18.8ms ± 0% 5.6ms ± 4% -69.95% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-16-8 72.2ms ± 0% 21.6ms ± 7% -70.11% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-64-8 281ms ± 1% 76ms ± 2% -73.10% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-128-8 558ms ± 3% 153ms ± 5% -72.65% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-4-8 1.98kB ± 0% 1.74kB ± 1% -12.15% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-16-8 3.90kB ± 0% 2.96kB ± 1% -24.18% (p=0.016 n=4+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-64-8 11.6kB ± 0% 9.3kB ± 2% -19.71% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-128-8 22.0kB ± 0% 21.1kB ± 2% -3.91% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-4-8 2.75kB ± 5% 2.50kB ± 3% -9.15% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-16-8 4.91kB ± 6% 4.04kB ± 6% -17.75% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-64-8 13.0kB ± 4% 12.0kB ± 8% -7.92% (p=0.032 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-128-8 23.5kB ± 4% 30.6kB ± 0% +30.28% (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-4-8 406kB ± 0% 95kB ± 1% -76.50% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-16-8 1.37MB ± 0% 0.12MB ± 2% -90.95% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-64-8 5.25MB ± 0% 0.52MB ±15% -90.05% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-128-8 10.5MB ± 0% 1.4MB ±22% -86.75% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-4-8 3.29MB ± 0% 0.88MB ± 0% -73.15% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-16-8 12.9MB ± 0% 2.2MB ± 2% -82.81% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-64-8 51.7MB ± 0% 4.8MB ±29% -90.81% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-128-8 104MB ± 0% 10MB ±28% -90.44% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-4-8 60.0 ± 0% 24.0 ± 0% -60.00% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-16-8 192 ± 0% 48 ± 0% -75.00% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-64-8 720 ± 0% 144 ± 0% -80.00% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-128-8 1.42k ± 0% 0.27k ± 0% -80.90% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-4-8 67.0 ± 0% 31.0 ± 0% -53.73% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-16-8 199 ± 0% 55 ± 0% -72.36% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-64-8 727 ± 0% 151 ± 0% -79.23% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-128-8 1.43k ± 0% 0.28k ± 0% -80.50% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-4-8 32.1k ± 0% 0.0k ± 0% -99.92% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-16-8 128k ± 0% 0k ± 0% ~ (p=0.079 n=4+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-64-8 513k ± 0% 0k ± 0% -99.97% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-128-8 1.03M ± 0% 0.00M ± 1% -99.97% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-4-8 320k ± 0% 0k ± 0% ~ (p=0.079 n=4+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-16-8 1.28M ± 0% 0.00M ± 0% -100.00% (p=0.000 n=5+4)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-64-8 5.12M ± 0% 0.00M ± 5% -100.00% (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-128-8 10.2M ± 0% 0.0M ± 9% -100.00% (p=0.008 n=5+5)
Raw benchmark results:
Which issue(s) this PR fixes or relates to
Fixes https://github.com/grafana/mimir-squad/issues/1870
Checklist
- [X] Tests updated.
- [ ] Documentation added.
- [ ]
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
. - [ ]
about-versioning.md
updated with experimental features.
Cool, thanks for working on this! I'll try to give it a closer look soon, but one thing that occurred to me: Do you think it's worth and feasible putting this behind a feature flag to make rollback easier in case any unexpected issues appear? Not sure how risky you deem this change.
Do you think it's worth and feasible putting this behind a feature flag to make rollback easier in case any unexpected issues appear?
I believe it makes complete sense and is perfectly viable. Once we've tested that everything works as expected we can get rid of the old code. 👍
I'll see to rework the PR to allow both code paths. In the meantime, I'll mark the PR as a draft again.
Added new -query-frontend.use-shard-active-series-zero-allocation-decoder
experimental option to enable/disable this feature at will.
@flxbk @Logiraptor this is ready for review now
pinging @flxbk
ready for review again 👍
I do worry about debugging this if it ever breaks, so I added a fuzz test which compares this PR to the stdlib json result. The first failure is the input "{"data":[]" due to the missing }. The code in the PR accepts that invalid json. I understand that this code is extremely fast, but it makes me nervous at the same time. Do you think making the code fail on invalid json will eliminate the performance gains?
If I remember correctly, the current implementation would also accept the input {"data": [...]
, as the processing of the querier response stops as soon as the content of the data
key has been processed.
Also, after the change in 23e736b4a289ba1efaa27651d984402c3e6e1fd2, we now ensure that the content of said key at least takes the form [{...},{...}, ...]
, although it is true that the objects contained in the array can contain any random data, and such content would be propagated to the frontend. Which, on the other hand, I don't see as a cause for concern. If the querier returns corrupt data, it would make sense for the frontend to propagate such data, thereby highlighting the problem instead of covering it up.
pinging @flxbk @Logiraptor for a new review