mimir icon indicating copy to clipboard operation
mimir copied to clipboard

frontend: use zero-allocation response decoder in shard active series middleware

Open ortuman opened this issue 3 months ago • 4 comments

What this PR does

At certain times we've observed in ops huge spikes of allocations during the deserialization/processing of each of the query responses when merging the results in the shard active series frontend middleware, occasionally reaching values close to ~80% of total allocations.

Screenshot 2024-03-21 at 09 28 42

While this condition does not critically compromise the service, nor does imply a customer impact, such spikes translate into high CPU consumption that can negatively impact the TCO.

This PR aims to solve this problem through the implementation of a zero-allocation decoder, specifically adapted to the use case of said middleware, that borrows certain functions from the json-iter library, currently in use.

Broadly speaking, the way this new decoder works is by copying all contents of the response data key into an internal buffer that is subsequently reused once is received and consumed by the merger goroutine, rather than copying each of the strings associated with all label-value pairs.

Below are the comparative benchmark results before and after the change.

name                                                                         old time/op    new time/op    delta
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-4-8           8.93µs ± 0%    7.51µs ± 2%   -15.86%  (p=0.016 n=4+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-16-8          15.9µs ± 1%    17.0µs ± 0%    +6.66%  (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-64-8          45.0µs ± 1%    47.3µs ± 5%    +5.32%  (p=0.032 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-128-8         77.8µs ± 0%    82.7µs ± 0%    +6.39%  (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-4-8         14.5µs ± 1%    13.3µs ± 3%    -8.24%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-16-8        25.4µs ± 3%    25.7µs ± 1%      ~     (p=0.310 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-64-8        59.9µs ± 1%    65.3µs ± 0%    +8.99%  (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-128-8       91.7µs ± 2%   136.8µs ±21%   +49.12%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-4-8       1.90ms ± 0%    0.58ms ±12%   -69.26%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-16-8      7.22ms ± 0%    2.18ms ±12%   -69.82%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-64-8      28.3ms ± 1%     8.0ms ±22%   -71.65%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-128-8     56.1ms ± 1%    14.5ms ± 6%   -74.14%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-4-8      18.8ms ± 0%     5.6ms ± 4%   -69.95%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-16-8     72.2ms ± 0%    21.6ms ± 7%   -70.11%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-64-8      281ms ± 1%      76ms ± 2%   -73.10%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-128-8     558ms ± 3%     153ms ± 5%   -72.65%  (p=0.008 n=5+5)

name                                                                         old alloc/op   new alloc/op   delta
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-4-8           1.98kB ± 0%    1.74kB ± 1%   -12.15%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-16-8          3.90kB ± 0%    2.96kB ± 1%   -24.18%  (p=0.016 n=4+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-64-8          11.6kB ± 0%     9.3kB ± 2%   -19.71%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-128-8         22.0kB ± 0%    21.1kB ± 2%    -3.91%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-4-8         2.75kB ± 5%    2.50kB ± 3%    -9.15%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-16-8        4.91kB ± 6%    4.04kB ± 6%   -17.75%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-64-8        13.0kB ± 4%    12.0kB ± 8%    -7.92%  (p=0.032 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-128-8       23.5kB ± 4%    30.6kB ± 0%   +30.28%  (p=0.016 n=5+4)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-4-8        406kB ± 0%      95kB ± 1%   -76.50%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-16-8      1.37MB ± 0%    0.12MB ± 2%   -90.95%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-64-8      5.25MB ± 0%    0.52MB ±15%   -90.05%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-128-8     10.5MB ± 0%     1.4MB ±22%   -86.75%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-4-8      3.29MB ± 0%    0.88MB ± 0%   -73.15%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-16-8     12.9MB ± 0%     2.2MB ± 2%   -82.81%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-64-8     51.7MB ± 0%     4.8MB ±29%   -90.81%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-128-8     104MB ± 0%      10MB ±28%   -90.44%  (p=0.008 n=5+5)

name                                                                         old allocs/op  new allocs/op  delta
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-4-8             60.0 ± 0%      24.0 ± 0%   -60.00%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-16-8             192 ± 0%        48 ± 0%   -75.00%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-64-8             720 ± 0%       144 ± 0%   -80.00%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=none/num-responses-128-8          1.42k ± 0%     0.27k ± 0%   -80.90%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-4-8           67.0 ± 0%      31.0 ± 0%   -53.73%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-16-8           199 ± 0%        55 ± 0%   -72.36%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-64-8           727 ± 0%       151 ± 0%   -79.23%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/encoding=snappy/num-responses-128-8        1.43k ± 0%     0.28k ± 0%   -80.50%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-4-8        32.1k ± 0%      0.0k ± 0%   -99.92%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-16-8        128k ± 0%        0k ± 0%      ~     (p=0.079 n=4+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-64-8        513k ± 0%        0k ± 0%   -99.97%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=1_000/num-responses-128-8      1.03M ± 0%     0.00M ± 1%   -99.97%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-4-8        320k ± 0%        0k ± 0%      ~     (p=0.079 n=4+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-16-8      1.28M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=5+4)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-64-8      5.12M ± 0%     0.00M ± 5%  -100.00%  (p=0.008 n=5+5)
ActiveSeriesMiddlewareMergeResponses/seriesCount=10_000/num-responses-128-8     10.2M ± 0%      0.0M ± 9%  -100.00%  (p=0.008 n=5+5)

Raw benchmark results:

Which issue(s) this PR fixes or relates to

Fixes https://github.com/grafana/mimir-squad/issues/1870

Checklist

  • [X] Tests updated.
  • [ ] Documentation added.
  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [ ] about-versioning.md updated with experimental features.

ortuman avatar Mar 19 '24 16:03 ortuman

Cool, thanks for working on this! I'll try to give it a closer look soon, but one thing that occurred to me: Do you think it's worth and feasible putting this behind a feature flag to make rollback easier in case any unexpected issues appear? Not sure how risky you deem this change.

flxbk avatar Mar 21 '24 09:03 flxbk

Do you think it's worth and feasible putting this behind a feature flag to make rollback easier in case any unexpected issues appear?

I believe it makes complete sense and is perfectly viable. Once we've tested that everything works as expected we can get rid of the old code. 👍

I'll see to rework the PR to allow both code paths. In the meantime, I'll mark the PR as a draft again.

ortuman avatar Mar 21 '24 10:03 ortuman

Added new -query-frontend.use-shard-active-series-zero-allocation-decoder experimental option to enable/disable this feature at will.

@flxbk @Logiraptor this is ready for review now

ortuman avatar Mar 21 '24 16:03 ortuman

pinging @flxbk

ready for review again 👍

ortuman avatar Mar 25 '24 10:03 ortuman

I do worry about debugging this if it ever breaks, so I added a fuzz test which compares this PR to the stdlib json result. The first failure is the input "{"data":[]" due to the missing }. The code in the PR accepts that invalid json. I understand that this code is extremely fast, but it makes me nervous at the same time. Do you think making the code fail on invalid json will eliminate the performance gains?

If I remember correctly, the current implementation would also accept the input {"data": [...], as the processing of the querier response stops as soon as the content of the data key has been processed.

Also, after the change in 23e736b4a289ba1efaa27651d984402c3e6e1fd2, we now ensure that the content of said key at least takes the form [{...},{...}, ...], although it is true that the objects contained in the array can contain any random data, and such content would be propagated to the frontend. Which, on the other hand, I don't see as a cause for concern. If the querier returns corrupt data, it would make sense for the frontend to propagate such data, thereby highlighting the problem instead of covering it up.

ortuman avatar Apr 01 '24 13:04 ortuman

pinging @flxbk @Logiraptor for a new review

ortuman avatar Apr 01 '24 13:04 ortuman