Perf/use batching for DailyStatisticsFeeder
Description
- The DailyStatisticsFeeder does a lot of requests to the database. This PR aims to batch thoses requests
Related issue(s)
Hi @Apoorva64, and thanks for this contribution!
This is well thought, and seems to be well executed however it breaks the "service interaction check" feature we've built on this mechanism (see https://microcks.io/blog/testcontainers-modules-0.3/#1-interaction-checks).
I think it can be fixed as you have a flushToDatabae() method on the feeder than can be used on purpose when checking interactions.
But, before going further, I'd like to have more context on the reason you worked on this one and the limitations you observed:
- Did you collect some performance numbers?
- What is the threshold where no batching starts to make things difficult on your use case?
- Have you tried setting up the index recommendations on the database? (see https://github.com/microcks/community/tree/main/tuning/mongodb-generic)
- Did the indexes bring you some performance boosts? If no, why?
- Have you considered using a Virtual Thread instead of a pure OS thread?
Thanks,
--> Did you collect some performance numbers? i've run quick perf tests with
scenarios: {
browse: {
executor: 'constant-vus',
exec: 'browse',
vus: 20,
duration: '1m',
},
invokeRESTMocks: {
executor: 'per-vu-iterations',
exec: 'invokeRESTMocks',
vus: 80,
iterations: 20000,
startTime: '5s',
maxDuration: '2m',
},
}
with a WAIT_TIME of 0.1
Tests on nightly Build without indexes
K6 Run
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: .\benchmark\bench-microcks.js
output: Prometheus remote write (http://127.0.0.1:9091/api/v1/write)
scenarios: (100.00%) 2 scenarios, 100 max VUs, 2m35s max duration (incl. graceful stop):
* browse: 20 looping VUs for 1m0s (exec: browse, gracefulStop: 30s)
* invokeRESTMocks: 20000 iterations for each of 80 VUs (maxDuration: 2m0s, exec: invokeRESTMocks, startTime: 5s, gracefulStop: 30s)
✓ status code should be 200
â–ˆ REST API Tests
✓ pastryCall status is 200
✓ eclairCall status is 200
✓ eclairXmlCall status is 200
✓ eclairXmlCall response is XML
✓ millefeuilleCall status is 200
checks.........................: 100.00% ✓ 245779 ✗ 0
data_received..................: 144 MB 1.1 MB/s
data_sent......................: 26 MB 207 kB/s
dropped_iterations.............: 1551297 12393.702821/s
group_duration.................: avg=197.22ms min=125.29ms med=191.56ms max=462.81ms p(90)=229.97ms p(95)=249.04ms
http_req_blocked...............: avg=16.05µs min=0s med=0s max=9.11ms p(90)=0s p(95)=0s
http_req_connecting............: avg=9.85µs min=0s med=0s max=7.74ms p(90)=0s p(95)=0s
http_req_duration..............: avg=50ms min=2.39ms med=47.6ms max=367.4ms p(90)=61.25ms p(95)=68.54ms
{ expected_response:true }...: avg=50ms min=2.39ms med=47.6ms max=367.4ms p(90)=61.25ms p(95)=68.54ms
http_req_failed................: 0.00% ✓ 0 ✗ 201604
http_req_receiving.............: avg=105.16µs min=0s med=0s max=121.6ms p(90)=507.9µs p(95)=616µs
http_req_sending...............: avg=21.18µs min=0s med=0s max=20.07ms p(90)=0s p(95)=0s
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=49.88ms min=1.78ms med=47.48ms max=367.4ms p(90)=61.11ms p(95)=68.41ms
http_reqs......................: 201604 1610.665181/s
iteration_duration.............: avg=212.17ms min=125.29ms med=192.79ms max=808.52ms p(90)=244.49ms p(95)=313.14ms
iterations.....................: 50967 407.188212/s
vus............................: 80 min=20 max=100
vus_max........................: 100 min=100 max=100
FlameGraph
In the FlameGraph we can see the methods
executeFindMultiInternal and executeFindOneInternal(mongodb calls) which accounts around 20% of the total time
Tests on nightly Build with indexes
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: .\benchmark\bench-microcks.js
output: Prometheus remote write (http://127.0.0.1:9091/api/v1/write)
scenarios: (100.00%) 2 scenarios, 100 max VUs, 2m35s max duration (incl. graceful stop):
* browse: 20 looping VUs for 1m0s (exec: browse, gracefulStop: 30s)
* invokeRESTMocks: 20000 iterations for each of 80 VUs (maxDuration: 2m0s, exec: invokeRESTMocks, startTime: 5s, gracefulStop: 30s)
✓ status code should be 200
â–ˆ REST API Tests
✓ pastryCall status is 200
✓ eclairCall status is 200
✓ eclairXmlCall status is 200
✓ eclairXmlCall response is XML
✓ millefeuilleCall status is 200
checks.........................: 100.00% ✓ 179894 ✗ 0
data_received..................: 109 MB 868 kB/s
data_sent......................: 19 MB 152 kB/s
dropped_iterations.............: 1564446 12497.5697/s
group_duration.................: avg=270.14ms min=124.49ms med=270.46ms max=482.5ms p(90)=317.89ms p(95)=332.49ms
http_req_blocked...............: avg=16.26µs min=0s med=0s max=10.66ms p(90)=0s p(95)=0s
http_req_connecting............: avg=10.31µs min=0s med=0s max=10.66ms p(90)=0s p(95)=0s
http_req_duration..............: avg=68.17ms min=2.88ms med=67.67ms max=386.14ms p(90)=94.88ms p(95)=103.82ms
{ expected_response:true }...: avg=68.17ms min=2.88ms med=67.67ms max=386.14ms p(90)=94.88ms p(95)=103.82ms
http_req_failed................: 0.00% ✓ 0 ✗ 148588
http_req_receiving.............: avg=102.88µs min=0s med=0s max=22.05ms p(90)=507.6µs p(95)=546.29µs
http_req_sending...............: avg=22.1µs min=0s med=0s max=13.06ms p(90)=0s p(95)=0s
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=68.04ms min=2.49ms med=67.54ms max=385.63ms p(90)=94.75ms p(95)=103.69ms
http_reqs......................: 148588 1186.994557/s
iteration_duration.............: avg=286.95ms min=124.49ms med=273.3ms max=774.09ms p(90)=334.1ms p(95)=464.57ms
iterations.....................: 37678 300.99053/s
vus............................: 80 min=20 max=100
vus_max........................: 100 min=100 max=100
running (2m05.2s), 000/100 VUs, 37678 complete and 0 interrupted iterations
browse ✓ [======================================] 20 VUs 1m0s
invokeRESTMocks ✓ [======================================] 80 VUs 2m0s 0035554/1600000 iters, 20000 per VU
FlameGraph
MongoDb indexes don't seem to impact performance much
Tests on use-batching-for-metrics without indexes
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: .\benchmark\bench-microcks.js
output: Prometheus remote write (http://127.0.0.1:9091/api/v1/write)
scenarios: (100.00%) 2 scenarios, 100 max VUs, 2m35s max duration (incl. graceful stop):
* browse: 20 looping VUs for 1m0s (exec: browse, gracefulStop: 30s)
* invokeRESTMocks: 20000 iterations for each of 80 VUs (maxDuration: 2m0s, exec: invokeRESTMocks, startTime: 5s, gracefulStop: 30s)
✓ status code should be 200
â–ˆ REST API Tests
✓ pastryCall status is 200
✓ eclairCall status is 200
✓ eclairXmlCall status is 200
✓ eclairXmlCall response is XML
✓ millefeuilleCall status is 200
checks.........................: 100.00% ✓ 664408 ✗ 0
data_received..................: 367 MB 2.9 MB/s
data_sent......................: 69 MB 554 kB/s
dropped_iterations.............: 1467733 11737.233652/s
group_duration.................: avg=72.5ms min=39.63ms med=70.83ms max=183.26ms p(90)=86.61ms p(95)=93.09ms
http_req_blocked...............: avg=24.52µs min=0s med=0s max=26.52ms p(90)=0s p(95)=0s
http_req_connecting............: avg=16.37µs min=0s med=0s max=26.52ms p(90)=0s p(95)=0s
http_req_duration..............: avg=18.03ms min=2.16ms med=17.15ms max=371.76ms p(90)=23.78ms p(95)=26.67ms
{ expected_response:true }...: avg=18.03ms min=2.16ms med=17.15ms max=371.76ms p(90)=23.78ms p(95)=26.67ms
http_req_failed................: 0.00% ✓ 0 ✗ 538287
http_req_receiving.............: avg=145.28µs min=0s med=0s max=42.38ms p(90)=511.9µs p(95)=1ms
http_req_sending...............: avg=32.73µs min=0s med=0s max=28.79ms p(90)=0s p(95)=0s
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=17.85ms min=2.04ms med=16.97ms max=371.76ms p(90)=23.57ms p(95)=26.43ms
http_reqs......................: 538287 4304.597833/s
iteration_duration.............: avg=79.8ms min=39.63ms med=71.16ms max=817.16ms p(90)=88.88ms p(95)=99.01ms
iterations.....................: 135340 1082.293035/s
vus............................: 80 min=20 max=100
vus_max........................: 100 min=100 max=100
running (2m05.0s), 000/100 VUs, 135340 complete and 0 interrupted iterations
browse ✓ [======================================] 20 VUs 1m0s
invokeRESTMocks ✓ [======================================] 80 VUs 2m0s 0132267/1600000 iters, 20000 per VU
FlameGraph
In the FlameGraph we can see the methods executeFindMultiInternal and executeFindOneInternal(mongodb calls) are gone
i'll answer the rest of the questions and provide a more detailled explaination with a reproductable environment during the following week
MongoDb indexes don't seem to impact performance much
Even seems to be worse 🤣 What MongoDB are you using? The one we provide? Executed in docker compose? A remote one? Have you checked indexes on daily statistics collection have been correctly created?
I would also be interested in how you first discovered this problem. What was the use-case you were trying to address? How many services/hits/users make you trigger this exploration and discover this optimisation?
I'm using the Mongodb in the compose. I'm currently building a setup with grafana and some dashboard to show the problems. Yes the statistics are created correctly (the tests on it are green)
I didn't go from a specific use-case for this optimisation. I just saw that microcks was too slow and cloud clearly be faster with some little changes xD