Investigation into query scheduler dimensions

Open Logiraptor opened this issue 7 months ago • 1 comments

This issue is meant to aid in the investigation of https://github.com/grafana/mimir-squad/issues/1945. I'll reference code from this branch throughout: https://github.com/grafana/mimir/tree/logiraptor/prioritized-queues

Reproducing the problem

The basic idea is this:

We run mimir with a single querier and concurrency = 4
This means for each second of wall time, we have 4s of capacity for queries
We set ingesters to 1s slowdown
We set store-gateways to 5s slowdown
Then we run 1 ingester query and 1 store-gateway query per second via k6
This is 6s of load per 1s of wall time. we do not have enough capacity to service this load

Then I tested three versions of mimir with this setup:

Control: Here are the results with `main`

Full K6 Results

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load-testing-with-k6.js
        output: -

     scenarios: (100.00%) 3 scenarios, 201 max VUs, 10m30s max duration (incl. graceful stop):
              * ingester_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_ingester_query, gracefulStop: 30s)
              * store_gateway_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_storegateway_query, gracefulStop: 30s)
              * writing_metrics: Up to 0.05 iterations/s for 10m0s over 3 stages (maxVUs: 1, exec: write, gracefulStop: 30s)

     ✓ write worked

     █ ingester

       ✗ expected request status to equal 200
        ↳  85% — ✓ 349 / ✗ 57
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     █ store-gateway

       ✗ expected request status to equal 200
        ↳  49% — ✓ 198 / ✗ 206
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     checks........................................: 89.39%  ✓ 2217     ✗ 263  
     ✓ { type:ingester }...........................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:store-gateway }......................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:write }..............................: 100.00% ✓ 29       ✗ 0    
     data_received.................................: 180 kB  285 B/s
     data_sent.....................................: 833 kB  1.3 kB/s
     dropped_iterations............................: 241     0.38253/s
     group_duration................................: avg=1m33s    min=1.02s  med=1m56s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     http_req_blocked..............................: avg=286.87µs min=1µs    med=5µs   max=41.66ms p(90)=553.2µs p(95)=677.2µs 
     http_req_connecting...........................: avg=249µs    min=0s     med=0s    max=41.61ms p(90)=496.2µs p(95)=598.19µs
     http_req_duration.............................: avg=1m29s    min=7.18ms med=1m56s max=2m0s    p(90)=2m0s    p(95)=2m0s    
       { expected_response:true }..................: avg=1m16s    min=7.18ms med=1m18s max=1m59s   p(90)=1m58s   p(95)=1m58s   
     ✗ { type:ingester }...........................: avg=1m31s    min=1.02s  med=1m56s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     ✗ { type:store-gateway }......................: avg=1m34s    min=5.02s  med=2m0s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
     ✓ { url:http://localhost:8080/api/v1/push }...: avg=0s       min=0s     med=0s    max=0s      p(90)=0s      p(95)=0s      
     http_req_failed...............................: 31.34%  ✓ 263      ✗ 576  
     http_req_receiving............................: avg=46.57µs  min=0s     med=51µs  max=830µs   p(90)=79µs    p(95)=100µs   
     http_req_sending..............................: avg=37.95µs  min=5µs    med=24µs  max=760µs   p(90)=66µs    p(95)=79.09µs 
     http_req_tls_handshaking......................: avg=0s       min=0s     med=0s    max=0s      p(90)=0s      p(95)=0s      
     http_req_waiting..............................: avg=1m29s    min=7.07ms med=1m56s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     http_reqs.....................................: 839     1.331714/s
     iteration_duration............................: avg=1m29s    min=8.11ms med=1m56s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     iterations....................................: 839     1.331714/s
     vus...........................................: 150     min=2      max=200
     vus_max.......................................: 201     min=21     max=201


running (10m30.0s), 000/201 VUs, 839 complete and 150 interrupted iterations
ingester_queries      ✓ [======================================] 076/100 VUs  10m0s  1.00 iters/s
store_gateway_queries ✓ [======================================] 074/100 VUs  10m0s  1.00 iters/s
writing_metrics       ✓ [======================================] 0/1 VUs      10m0s  0.05 iters/s
ERRO[0631] thresholds on metrics 'http_req_duration{type:ingester}, http_req_duration{type:store-gateway}' have been crossed

     █ ingester
       ✗ expected request status to equal 200
        ↳  85% — ✓ 349 / ✗ 57

     █ store-gateway

       ✗ expected request status to equal 200
        ↳  49% — ✓ 198 / ✗ 206

     ✗ { type:ingester }...........................: avg=1m58s    min=1s     med=1m59s  max=2m0s   p(90)=1m59s   p(95)=1m59s  
     ✗ { type:store-gateway }......................: avg=47.53s   min=5.01s  med=46.66s max=1m28s  p(90)=1m22s   p(95)=1m26s

Here we see both ingesters and store-gateways have high p95 latency and many requests failed.

Test 2: Additional Queue Dimensions

For this one I just enabled the flag for additional queue dimensions.

Full K6 Results

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load-testing-with-k6.js
        output: -

     scenarios: (100.00%) 3 scenarios, 201 max VUs, 10m30s max duration (incl. graceful stop):
              * ingester_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_ingester_query, gracefulStop: 30s)
              * store_gateway_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_storegateway_query, gracefulStop: 30s)
              * writing_metrics: Up to 0.05 iterations/s for 10m0s over 3 stages (maxVUs: 1, exec: write, gracefulStop: 30s)

     ✓ write worked

     █ ingester

       ✗ expected request status to equal 200
        ↳  3% — ✓ 13 / ✗ 370
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     █ store-gateway

       ✓ expected request status to equal 200
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     checks........................................: 84.50%  ✓ 2018     ✗ 370  
     ✓ { type:ingester }...........................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:store-gateway }......................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:write }..............................: 100.00% ✓ 30       ✗ 0    
     data_received.................................: 164 kB  260 B/s
     data_sent.....................................: 861 kB  1.4 kB/s
     dropped_iterations............................: 209     0.331791/s
     group_duration................................: avg=1m18s    min=1s     med=1m21s  max=2m0s   p(90)=2m0s    p(95)=2m0s   
     http_req_blocked..............................: avg=252.43µs min=2µs    med=10µs   max=5.47ms p(90)=537µs   p(95)=602µs  
     http_req_connecting...........................: avg=223.3µs  min=0s     med=0s     max=5.42ms p(90)=479.4µs p(95)=542.2µs
     http_req_duration.............................: avg=1m16s    min=7.24ms med=1m18s  max=2m0s   p(90)=1m59s   p(95)=1m59s  
       { expected_response:true }..................: avg=45.66s   min=7.24ms med=44.66s max=1m48s  p(90)=1m23s   p(95)=1m26s  
     ✗ { type:ingester }...........................: avg=1m58s    min=1s     med=1m59s  max=2m0s   p(90)=1m59s   p(95)=1m59s  
     ✗ { type:store-gateway }......................: avg=47.53s   min=5.01s  med=46.66s max=1m28s  p(90)=1m22s   p(95)=1m26s  
     ✓ { url:http://localhost:8080/api/v1/push }...: avg=0s       min=0s     med=0s     max=0s     p(90)=0s      p(95)=0s     
     http_req_failed...............................: 41.24%  ✓ 370      ✗ 527  
     http_req_receiving............................: avg=35.93µs  min=0s     med=46µs   max=1ms    p(90)=67µs    p(95)=76µs   
     http_req_sending..............................: avg=38.81µs  min=8µs    med=30µs   max=1.03ms p(90)=64µs    p(95)=75.19µs
     http_req_tls_handshaking......................: avg=0s       min=0s     med=0s     max=0s     p(90)=0s      p(95)=0s     
     http_req_waiting..............................: avg=1m16s    min=7.15ms med=1m18s  max=2m0s   p(90)=1m59s   p(95)=1m59s  
     http_reqs.....................................: 897     1.424001/s
     iteration_duration............................: avg=1m16s    min=8.1ms  med=1m18s  max=2m0s   p(90)=2m0s    p(95)=2m0s   
     iterations....................................: 897     1.424001/s
     vus...........................................: 125     min=2      max=174
     vus_max.......................................: 177     min=21     max=177


running (10m29.9s), 000/177 VUs, 897 complete and 125 interrupted iterations
ingester_queries      ✓ [======================================] 075/100 VUs  10m0s  1.00 iters/s
store_gateway_queries ✓ [======================================] 050/076 VUs  10m0s  1.00 iters/s
writing_metrics       ✓ [======================================] 0/1 VUs      10m0s  0.05 iters/s
ERRO[0631] thresholds on metrics 'http_req_duration{type:ingester}, http_req_duration{type:store-gateway}' have been crossed

     █ ingester

       ✗ expected request status to equal 200
        ↳  3% — ✓ 13 / ✗ 370

     █ store-gateway

       ✓ expected request status to equal 200

     ✗ { type:ingester }...........................: avg=1m58s    min=1s     med=1m59s  max=2m0s   p(90)=1m59s   p(95)=1m59s  
     ✗ { type:store-gateway }......................: avg=47.53s   min=5.01s  med=46.66s max=1m28s  p(90)=1m22s   p(95)=1m26s

In this test, requests were still slow, but surprisingly store-gateway requests did not fail, and ingester requests failed much more often.

Test 3: Round robin b/w components

For this test, I added a simple round-robin b/w query components, trying really hard not to change anything in the existing code. diff. The basic idea is to extract an interface from queueBroker and then write a new implementation which wraps the existing one.

Full K6 Results

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load-testing-with-k6.js
        output: -

     scenarios: (100.00%) 3 scenarios, 201 max VUs, 10m30s max duration (incl. graceful stop):
              * ingester_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_ingester_query, gracefulStop: 30s)
              * store_gateway_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_storegateway_query, gracefulStop: 30s)
              * writing_metrics: Up to 0.05 iterations/s for 10m0s over 3 stages (maxVUs: 1, exec: write, gracefulStop: 30s)

     ✓ write worked

     █ ingester

       ✗ expected request status to equal 200
        ↳  2% — ✓ 8 / ✗ 375
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     █ store-gateway

       ✗ expected request status to equal 200
        ↳  99% — ✓ 496 / ✗ 1
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     checks........................................: 84.46%  ✓ 2045     ✗ 376  
     ✓ { type:ingester }...........................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:store-gateway }......................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:write }..............................: 100.00% ✓ 29       ✗ 0    
     data_received.................................: 166 kB  263 B/s
     data_sent.....................................: 843 kB  1.3 kB/s
     dropped_iterations............................: 204     0.323806/s
     group_duration................................: avg=1m16s    min=1.02s  med=1m10s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     http_req_blocked..............................: avg=287.11µs min=1µs    med=8µs   max=18.01ms p(90)=542.4µs p(95)=711.59µs
     http_req_connecting...........................: avg=256.06µs min=0s     med=0s    max=17.47ms p(90)=472.6µs p(95)=615.19µs
     http_req_duration.............................: avg=1m14s    min=7.89ms med=1m8s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
       { expected_response:true }..................: avg=41.83s   min=7.89ms med=42.3s max=1m20s   p(90)=1m11s   p(95)=1m15s   
     ✗ { type:ingester }...........................: avg=1m57s    min=1.02s  med=2m0s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
     ✗ { type:store-gateway }......................: avg=45.06s   min=10.03s med=44.5s max=2m0s    p(90)=1m12s   p(95)=1m16s   
     ✓ { url:http://localhost:8080/api/v1/push }...: avg=0s       min=0s     med=0s    max=0s      p(90)=0s      p(95)=0s      
     http_req_failed...............................: 41.36%  ✓ 376      ✗ 533  
     http_req_receiving............................: avg=38.31µs  min=0s     med=43µs  max=511µs   p(90)=81.2µs  p(95)=101µs   
     http_req_sending..............................: avg=39.23µs  min=7µs    med=28µs  max=811µs   p(90)=62µs    p(95)=85µs    
     http_req_tls_handshaking......................: avg=0s       min=0s     med=0s    max=0s      p(90)=0s      p(95)=0s      
     http_req_waiting..............................: avg=1m14s    min=7.77ms med=1m8s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
     http_reqs.....................................: 909     1.442843/s
     iteration_duration............................: avg=1m14s    min=8.83ms med=1m8s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
     iterations....................................: 909     1.442843/s
     vus...........................................: 118     min=2      max=168
     vus_max.......................................: 171     min=21     max=171


running (10m30.0s), 000/171 VUs, 909 complete and 118 interrupted iterations
ingester_queries      ✓ [======================================] 074/100 VUs  10m0s  1.00 iters/s
store_gateway_queries ✓ [======================================] 044/070 VUs  10m0s  1.00 iters/s
writing_metrics       ✓ [======================================] 0/1 VUs      10m0s  0.05 iters/s
ERRO[0631] thresholds on metrics 'http_req_duration{type:ingester}, http_req_duration{type:store-gateway}' have been crossed

Screenshot 2024-06-28 at 5 28 16 PM

     █ ingester

       ✗ expected request status to equal 200
        ↳  2% — ✓ 8 / ✗ 375

     █ store-gateway

       ✗ expected request status to equal 200
        ↳  99% — ✓ 496 / ✗ 1

     ✗ { type:ingester }...........................: avg=1m57s    min=1.02s  med=2m0s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
     ✗ { type:store-gateway }......................: avg=45.06s   min=10.03s med=44.5s max=2m0s    p(90)=1m12s   p(95)=1m16s

This test didn't do any better than the previous tests, and from this I conclude that simple round robin b/w components is insufficient.

Test 4: Prioritized component per-worker

For this one, I implemented an idea based on my reading of the paper: 2DFQ: Two-Dimensional Fair Queuing for Multi-Tenant Cloud Services.

The basic idea is to assign each querier worker a certain component to prioritize. When dequeuing a request, that worker will always first check the prioritized component before attempting to dequeue from other components. As long as each component has a worker assigned, then we can guarantee that all requests will eventually be processed, and it also guarantees that requests to one component can't be blocked behind an arbitrary length of other requests.

Heres the diff from the round-robin example: diff

Full K6 Results

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: load-testing-with-k6.js
        output: -

     scenarios: (100.00%) 3 scenarios, 201 max VUs, 10m30s max duration (incl. graceful stop):
              * ingester_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_ingester_query, gracefulStop: 30s)
              * store_gateway_queries: 1.00 iterations/s for 10m0s (maxVUs: 10-100, exec: run_storegateway_query, gracefulStop: 30s)
              * writing_metrics: Up to 0.05 iterations/s for 10m0s over 3 stages (maxVUs: 1, exec: write, gracefulStop: 30s)

     ✓ write worked

     █ ingester

       ✓ expected request status to equal 200
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     █ store-gateway

       ✗ expected request status to equal 200
        ↳  64% — ✓ 266 / ✗ 149
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     checks........................................: 95.78%  ✓ 3382     ✗ 149  
     ✓ { type:ingester }...........................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:store-gateway }......................: 0.00%   ✓ 0        ✗ 0    
     ✓ { type:write }..............................: 100.00% ✓ 30       ✗ 0    
     data_received.................................: 273 kB  434 B/s
     data_sent.....................................: 882 kB  1.4 kB/s
     dropped_iterations............................: 134     0.212697/s
     group_duration................................: avg=47.03s   min=1s     med=26.25s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     http_req_blocked..............................: avg=132.66µs min=1µs    med=4µs    max=19.96ms p(90)=433.8µs p(95)=507.59µs
     http_req_connecting...........................: avg=112.86µs min=0s     med=0s     max=19.89ms p(90)=374.4µs p(95)=442.39µs
     http_req_duration.............................: avg=45.64s   min=6.25ms med=26.08s max=2m0s    p(90)=2m0s    p(95)=2m0s    
       { expected_response:true }..................: avg=32.87s   min=6.25ms med=24.95s max=1m59s   p(90)=1m21s   p(95)=1m40s   
     ✗ { type:ingester }...........................: avg=20.68s   min=1s     med=20.22s max=35.2s   p(90)=33.98s  p(95)=34.46s  
     ✗ { type:store-gateway }......................: avg=1m23s    min=5s     med=1m34s  max=2m0s    p(90)=2m0s    p(95)=2m0s    
     ✓ { url:http://localhost:8080/api/v1/push }...: avg=0s       min=0s     med=0s     max=0s      p(90)=0s      p(95)=0s      
     http_req_failed...............................: 14.65%  ✓ 149      ✗ 868  
     http_req_receiving............................: avg=84.28µs  min=0s     med=65µs   max=3.45ms  p(90)=124.4µs p(95)=162.79µs
     http_req_sending..............................: avg=32.14µs  min=4µs    med=22µs   max=507µs   p(90)=54.4µs  p(95)=77.19µs 
     http_req_tls_handshaking......................: avg=0s       min=0s     med=0s     max=0s      p(90)=0s      p(95)=0s      
     http_req_waiting..............................: avg=45.64s   min=6.17ms med=26.08s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     http_reqs.....................................: 1017    1.614275/s
     iteration_duration............................: avg=45.64s   min=6.97ms med=26.09s max=2m0s    p(90)=2m0s    p(95)=2m0s    
     iterations....................................: 1017    1.614275/s
     vus...........................................: 81      min=2      max=132
     vus_max.......................................: 135     min=21     max=135


running (10m30.0s), 000/135 VUs, 1017 complete and 81 interrupted iterations
ingester_queries      ✓ [======================================] 005/034 VUs  10m0s  1.00 iters/s
store_gateway_queries ✓ [======================================] 076/100 VUs  10m0s  1.00 iters/s
writing_metrics       ✓ [======================================] 0/1 VUs      10m0s  0.05 iters/s
ERRO[0631] thresholds on metrics 'http_req_duration{type:ingester}, http_req_duration{type:store-gateway}' have been crossed

Screenshot 2024-07-01 at 11 20 38 AM

     █ ingester

       ✓ expected request status to equal 200

     █ store-gateway

       ✗ expected request status to equal 200
        ↳  64% — ✓ 266 / ✗ 149

     ✗ { type:ingester }...........................: avg=20.68s   min=1s     med=20.22s max=35.2s   p(90)=33.98s  p(95)=34.46s  
     ✗ { type:store-gateway }......................: avg=1m23s    min=5s     med=1m34s  max=2m0s    p(90)=2m0s    p(95)=2m0s

This time has a different story. p95 ingester latency is 34s (from 2m), and not a single ingester request failed. 36% of store-gateway requests failed, which we know is inevitable because there is not enough capacity in this test scenario.

Jul 01 '24 15:07 Logiraptor

mimir mimir copied to clipboard

Investigation into query scheduler dimensions

Reproducing the problem

Control: Here are the results with main

Test 2: Additional Queue Dimensions

Test 3: Round robin b/w components

Test 4: Prioritized component per-worker

mimir
mimir copied to clipboard

Control: Here are the results with `main`