Response refactor, increase requests-per-second
Description
This PR refactors response generation, speeding up operations with various body types and sizes. Comparison using the the code from PR #2895:
──────────────────────────────────────────────────────────────────────────── tcp6
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6 -Y
Body ────────── req/sec ────────── ────── req 50% times ──────── Master
KB array chunk string io array chunk string io
1 13662 9887 13552 7799 0.71 1.30 0.69 1.46
10 5430 2618 12787 3902 2.06 4.88 0.94 2.90
100 835 337 9945 610 13.30 32.55 1.13 17.63
256 350 135 8421 247 30.67 79.37 1.28 42.64
512 188 66 6784 123 56.31 157.21 1.60 84.97
1024 101 33 3717 63 102.29 299.51 2.19 163.60
2048 53 17 2524 32 189.07 588.73 4.30 315.08
─────────────────────────────────────────────────────────────────────
Body ────────── req/sec ────────── ─────── req 50% times ─────── PR #2896
KB array chunk string io array chunk string io
1 13985 13870 14013 9697 0.727 0.732 0.726 1.160
10 13715 13483 13717 9325 0.741 0.755 0.741 1.200
100 11400 10353 11172 9388 0.940 1.040 0.970 1.130
256 9105 7640 8549 9305 1.150 1.380 1.190 1.140
512 6759 5445 6776 8997 1.520 1.920 1.520 1.100
1024 4018 3348 4050 9015 1.860 2.740 2.660 1.150
2048 2340 1783 2438 8823 4.320 6.130 4.410 1.180
─────────────────────────────────────────────────────────────────────
There are significant speed increases for almost all body types/sizes, with the exception of 'single string' bodies, which show smaller increases.
The commits:
-
[io_buffer.rb - change to using StringIO](https://github.com/puma/puma/commit/8ca473f7b0c0da4d1a3bda06be0d9500e5dbe5f1)- allows IOBuffer to be used both as a string and as an IO. Previously it was subclassed fromString. -
request.rb - refactor - Split
fast_writeintofast_write_responseandfast_write_str. Refactor. Compute array bodies'Content-Lengthonly when it's not provided by app. See #2892 by @guilleiguaran. Only an Array responds to#to_ary, so it is used for the body type check.
All three of the below are small refactors
- test_persistent.rb - chunked tests must be enum, not array
- test_rack_server.rb - Add Rack::BodyProxy tests - from PR #2892.
- test_puma_server.rb - always start one thread
Please feel free to test these changes.
The branch at https://github.com/MSP-Greg/puma/tree/00-response-refactor-bench contains the code for both this PR and PR #2895.
Closes #2892, closes #2703
Your checklist for this pull request
- [x] I have reviewed the guidelines for contributing to this repository.
- [ ] I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
- [ ] My pull request is 100 lines added/removed or less so that it can be easily reviewed.
- [ ] If this PR doesn't need tests (docs change), I added
[ci skip]to the title of the PR. - [x] If this closes any issues, I have added "Closes
#issue" to the PR description or my commit messages. - [x] I have updated the documentation accordingly.
- [x] All new and existing tests passed, including Rubocop.
Below is data for ssl connections. Again, significant speed increases in most of the body types/sizes.
──────────────────────────────────────────────────────────────────────────── ssl
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Body ────────── req/sec ────────── ─────── req 50% times ─────── Master
KB array chunk string io array chunk string io
1 11090 7303 10931 7657 1.34 1.28 1.32 1.65
10 2267 2263 9755 3911 4.44 4.34 1.20 3.00
100 308 309 4151 673 38.56 38.93 2.96 16.38
256 125 123 2165 275 92.57 92.01 5.19 38.25
512 63 62 1191 145 179.65 178.39 9.07 75.12
1024 31 31 617 74 323.35 315.68 17.02 145.23
2048 15 15 320 38 632.93 646.78 32.60 264.60
─────────────────────────────────────────────────────────────────────
Body ────────── req/sec ────────── ─────── req 50% times ─────── PR #2896
KB array chunk string io array chunk string io
1 13802 13492 13820 9576 0.839 0.850 0.838 1.220
10 13131 12708 13067 9077 0.890 0.910 0.890 1.270
100 3799 4312 4482 2772 2.590 2.400 2.320 3.810
256 2177 2029 2231 1460 4.690 4.970 4.560 7.080
512 1175 1110 1196 798 8.400 9.030 8.250 12.800
1024 611 576 637 416 16.040 17.220 15.680 24.320
2048 305 290 325 215 32.290 34.010 30.720 46.800
─────────────────────────────────────────────────────────────────────
ruby 3.2.0dev (2022-06-14T01:21:55Z master 048f14221c) +YJIT [x86_64-linux]
Using Windows WSL2/Ubuntu 22.04 with OpenSSL 3.0.2
All of the above was done with Windows WSL2/Ubuntu 22.04. But, that's not native, essentially running in a VM.
So, I ran the benchmarks on GitHub Actions, which are 2 core VM's running on larger systems. They list several processors, I looked up two, one had 12 cores, another 32. I ran the test a few times to account for 'noisy neighbors'. Data below. Again, almost all response body type/size combinations run faster than current master...
wrk -t5 -c10 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -T5 -c2 -Y
Server cluster mode -w2 -t5:5, bind: tcp
ruby 3.2.0dev (2022-06-16T16:48:52Z master cd5cafa4a3) +YJIT [x86_64-linux]
═════════════════════════════════════════════════════════════════════ Master
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 15190 13626 14527 12669 0.233 0.614 0.551 0.257
10 11129 10219 13308 10879 0.228 0.846 0.434 0.337
100 3454 3310 7736 4412 2.700 2.960 1.190 1.990
256 1595 1493 4513 2252 6.080 6.470 1.920 3.830
512 862 803 2626 1258 11.340 10.970 3.430 7.810
1024 439 419 1517 668 24.280 23.980 6.110 14.820
2048 221 213 733 345 44.230 47.190 11.850 26.920
─────────────────────────────────────────────────────────────────────
═════════════════════════════════════════════════════════════════════ PR #2896
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 16005 15106 16250 13138 0.570 0.601 0.568 0.649
10 15634 14384 15077 12620 0.550 0.636 0.547 0.737
100 8499 7470 8817 7038 0.980 1.140 1.040 0.626
256 5215 4131 5281 6813 1.800 2.280 1.810 1.030
512 3133 2464 3155 5339 2.710 3.810 2.880 1.710
1024 1730 1336 1838 3419 5.160 6.540 5.120 2.800
2048 884 688 954 1942 9.760 11.880 9.760 4.860
─────────────────────────────────────────────────────────────────────
wrk latency data, grouped by response body size
═════════════════════════════════════════════════════════════════════ 1kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 15190 0.23 4.11 9.11 15.18 44.35 1022
chunk 13626 0.61 1.56 4.80 12.35 24.82 1040
string 14527 0.55 2.05 5.62 14.66 38.41 1022
io 12669 0.26 4.27 9.08 15.32 37.56 1022
───────────────────────────────────────────────────────────────── PR #2896
array 16005 0.57 0.65 0.96 3.27 40.51 1022
chunk 15106 0.60 0.72 1.00 2.71 41.34 1040
string 16250 0.57 0.64 0.87 2.37 18.10 1022
io 13138 0.65 0.88 1.59 4.47 27.82 1022
═════════════════════════════════════════════════════════════════════ 10kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 11129 0.23 4.44 9.49 15.43 75.46 10318
chunk 10219 0.85 2.00 4.71 12.26 33.53 10318
string 13308 0.43 3.33 8.82 16.75 35.13 10238
io 10879 0.34 4.39 9.84 18.99 34.19 10238
───────────────────────────────────────────────────────────────── PR #2896
array 15634 0.55 0.71 1.48 3.84 26.45 10239
chunk 14384 0.64 0.73 1.03 2.70 42.92 10318
string 15077 0.55 0.74 1.47 3.83 19.55 10238
io 12620 0.74 0.85 1.14 2.76 28.74 10239
═════════════════════════════════════════════════════════════════════ 100kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 3454 2.70 3.27 3.88 6.41 38.69 103104
chunk 3310 2.96 3.14 3.57 5.69 32.04 103109
string 7736 1.19 1.39 1.79 4.99 42.35 102398
io 4412 1.99 2.50 3.91 10.31 31.69 102398
───────────────────────────────────────────────────────────────── PR #2896
array 8499 0.98 1.48 2.44 7.07 21.90 102398
chunk 7470 1.14 1.53 2.49 6.84 33.93 103107
string 8817 1.04 1.19 1.56 3.99 43.69 102446
io 7038 0.63 17.34 33.80 44.19 63.60 102398
═════════════════════════════════════════════════════════════════════ 256kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 1595 6.08 6.49 7.41 12.81 49.06 263945
chunk 1493 6.47 7.03 8.10 15.22 74.17 263944
string 4513 1.92 2.43 3.95 11.27 35.72 262248
io 2252 3.83 5.49 9.87 24.42 60.74 262139
───────────────────────────────────────────────────────────────── PR #2896
array 5215 1.80 2.01 2.46 5.33 58.87 262138
chunk 4131 2.28 2.58 3.03 5.59 25.88 263972
string 5281 1.81 2.03 2.40 5.08 21.45 262155
io 6813 1.03 4.07 24.08 42.82 57.61 262139
═════════════════════════════════════════════════════════════════════ 512kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 862 11.34 11.94 13.37 20.81 44.18 527877
chunk 803 10.97 13.85 20.43 76.32 114.16 527884
string 2626 3.43 4.19 6.00 15.99 40.05 524363
io 1258 7.81 10.53 14.66 25.06 64.31 524290
───────────────────────────────────────────────────────────────── PR #2896
array 3133 2.71 3.63 6.26 16.88 37.30 524244
chunk 2464 3.81 4.41 5.39 9.98 27.48 528044
string 3155 2.88 3.34 4.83 13.47 39.83 524324
io 5339 1.71 2.29 12.73 41.39 69.71 524428
═════════════════════════════════════════════════════════════════════ 1024kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 439 24.28 26.15 29.12 37.36 54.84 1055750
chunk 419 23.98 25.56 27.65 38.95 61.14 1055735
string 1517 6.11 7.29 9.77 15.31 39.23 1048590
io 668 14.82 17.03 19.49 29.18 52.12 1048582
───────────────────────────────────────────────────────────────── PR #2896
array 1730 5.16 6.89 9.78 35.59 62.04 1048535
chunk 1336 6.54 8.45 13.55 40.03 90.03 1056056
string 1838 5.12 5.82 7.62 13.64 39.44 1048488
io 3419 2.80 3.45 4.86 38.38 61.40 1048474
═════════════════════════════════════════════════════════════════════ 2048kB
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── Master
array 221 44.23 47.96 52.12 120.97 347.25 2111458
chunk 213 47.19 51.63 55.62 64.85 95.46 2111497
string 733 11.85 17.93 27.84 64.10 147.48 2096385
io 345 26.92 31.42 43.47 240.98 314.45 2097178
───────────────────────────────────────────────────────────────── PR #2896
array 884 9.76 14.96 24.83 60.45 105.23 2097201
chunk 688 11.88 18.85 42.21 99.54 154.52 2110244
string 954 9.76 11.24 13.92 24.41 69.11 2096501
io 1942 4.86 6.20 8.41 36.39 64.73 2097527
Just some comments from the phone, haven't looked through the actual changes. Not sure when I'll be able to do that.
@MSP-Greg I think is waiting on some variable renames but otherwise looks good to go?
@nateberkopec working on this re PR #2892.
Very busy with a (paid) project fixing Ruby code I wrote 10+ years ago. I was pretty new to Ruby then, mostly did C# and js back then...
Question: Should an empty response body return a Content-Length: 0 header in the response?
That's a good question, HTTP allows it and some clients might want to know it (e.g for verification), but I'm not sure how common is having it for empty body responses.
@guilleiguaran I updated this, adding checks for if the body is a File io without a Content-Length.
Thanks for all your help here and in PR #2892...
Also, please see https://github.com/puma/puma/pull/2896/commits/eb5d7a35ce8b, that's the code similar to your PR. I start with app_body, and convert it to body. close is called on app_body at the end of the response handling.
From what I can tell, Content-Length: 0 is allowed as a value, and it's passed in the response. Some tests needed changes for it...
I'm not sure how common is having it for empty body responses.
Now it's being set for a body that's either [] or ['']
Rebased for Rack 3, added commit test_rack_server.rb - add tests for Rack::Chunked middleware
I updated the timing data shown in the first post. This PR has always used IO.copy_stream when response bodies were 'IO like'. That was updated in master with PR #2923, but this PR seems to be a few percent faster. The larger improvements when bodies are arrays or enums still exist compared to master.
I need to squash some commits and I'll push with a current rebase on master.
First, sorry the failure, mixed up a rebase. I hate it when that happens.
Now that the benchmark code is in master, if anyone can test this, it would be helpful, as locally I'm using WSL2/Ubuntu.
The 'default' test run takes about 5 minutes and generates all the data in the first post. You need wrk in your path.
Build/compile Puma master, and run
# IPv6
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6
# IPv4
benchmarks/local/response_time_wrk.sh -w2 -t5:5
This PR doesn't make any changes to the extension code, so you can switch to the PR branch and rerun the tests.
To run just a particular body type, the below example runs a 'file' body (i for io):
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -bi
Thanks in advance.
m1 macbook pro macos 12.5.1
master - ruby 3.1.0
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 18342 19389 20278 14428 0.48 0.50 0.46 0.60
10 14419 9414 18349 14362 0.65 1.24 0.49 0.62
100 4674 1828 13650 14416 2.31 6.14 0.66 0.62
256 2104 834 11236 13639 5.59 14.79 0.81 0.66
512 1108 414 6643 10097 11.29 30.53 1.34 0.84
1024 563 195 4071 5918 23.03 65.35 2.37 1.45
2048 231 82 1932 2610 44.87 124.47 5.34 3.52
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6
Server cluster mode -w2 -t5:5, bind: tcp6
Puma repo branch master
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
this pr - ruby 3.1.0
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 24542 23149 23006 15973 0.378 0.401 0.396 0.531
10 20703 19842 20825 16173 0.410 0.425 0.415 0.536
100 15954 13145 15236 14753 0.535 0.638 0.570 0.576
256 11202 8415 11325 13684 0.761 1.020 0.780 0.750
512 8226 5723 7644 11047 1.270 1.610 1.190 0.760
1024 4291 3046 4117 5309 1.920 2.870 2.600 1.590
2048 2424 1834 2368 2389 3.760 5.230 4.170 3.770
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6
Server cluster mode -w2 -t5:5, bind: tcp6
Puma repo branch 00-response-refactor
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
master - ruby 2.7.6
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 21411 19572 10786 15265 0.43 0.48 0.92 0.58
10 15530 10352 9982 15690 0.59 0.93 0.93 0.55
100 4959 2014 10186 14373 2.04 5.77 0.93 0.62
256 2199 806 10029 13493 5.35 15.32 0.95 0.66
512 1213 401 10125 10550 10.39 30.90 0.93 0.80
1024 597 226 10119 5716 21.59 57.79 0.94 1.51
2048 285 116 10601 3103 44.15 105.27 0.92 2.89
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6
Server cluster mode -w2 -t5:5, bind: tcp6
Puma repo branch master
ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [arm64-darwin21]
this pr - ruby 2.7.6
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 25633 26126 11252 18638 0.360 0.353 0.849 0.543
10 23257 23040 11272 18224 0.366 0.375 0.850 0.474
100 17578 15364 10246 15037 0.479 0.548 0.890 0.559
256 12490 9535 10391 14082 0.668 0.880 0.870 0.720
512 8315 6743 11556 9646 0.990 1.290 0.844 0.776
1024 4721 3806 10049 5196 1.650 2.230 0.920 1.560
2048 2526 2195 10145 2428 3.290 4.070 0.950 3.640
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s tcp6
Server cluster mode -w2 -t5:5, bind: tcp6
Puma repo branch 00-response-refactor
ruby 2.7.6p219 (2022-04-12 revision c9c2245c0a) [arm64-darwin21]
@agbaber
Thank you for the results, sorry they take so long. For the most part, your data tracks with mine, in that many of the speed increases are significant.
I'm wondering if something is wrong with the code, as the Ruby 2.7 string data is much different that Ruby 3.1. I've used Ruby master for most of my local testing.
Do you recall if latency summary section output before the main summary shown above listed the string body sizes as similar to the other sized bodies? Example from my output:
─────────────────────────────────────────────────────────────────────────────────── 2048kB
array 39 266.0 286.6 317.5 1510.0 1940.0 2097155
chunk 16 633.1 675.3 715.9 1370.0 1460.0 2111471
string 2627 3.9 15.1 33.2 51.2 77.8 2097409
io 7682 1.4 6.8 15.4 23.8 65.2 2097244
JFYI, the body size number isn't terribly accurate due to the limited digits in wrk's 'bytes read' output.
EDIT: I'm going to test on Ruby 2.7 locally.
@agbaber Thanks again. I found the bug, Ruby 3 doesn't raise a "can't modify a frozen string" error, but Ruby 2.7 does. That was the cause of the odd string data in Ruby 2.7. Fixed in PR #2951. Rebased this, so it has the change also...
Late to the party, but I tried this on Ruby 3.2.0 Preview 2. Following are the results.
My laptop information
ProductName: macOS
ProductVersion: 12.4
BuildVersion: 21F79
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21] without YJIT
master - 0e832e71ef
$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 20293 17969 22868 18407 0.36 0.60 0.53 0.57
10 11181 4149 19329 16614 0.74 2.19 0.42 0.65
100 1036 236 8728 7782 8.91 69.89 1.03 1.37
256 905 272 4574 4199 7.71 21.60 1.89 2.37
512 431 149 2729 2326 15.20 41.19 3.55 4.01
1024 172 30 1428 1261 35.52 103.26 6.54 7.58
2048 47 16 759 632 155.84 646.02 12.67 15.15
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch master
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]
View Full Result
$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
bundle exec bin/puma -q -b ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none -S tmp/bench_test_puma.state -w2 -t5:5 --control-url=tcp://127.0.0.1:40010 --control-token=test test/rackup/ci_select.ru
usage: sleep seconds
ruby: warning: Ruby was built without YJIT support
ruby: warning: Ruby was built without YJIT support
[14328] Puma starting in cluster mode...
[14328] * Puma version: 5.6.5 (ruby 3.2.0-p-1) ("Birdie's Version")
[14328] * Min threads: 5
[14328] * Max threads: 5
[14328] * Environment: development
[14328] * Master PID: 14328
[14328] * Workers: 2
[14328] * Restarts: (✔) hot (✔) phased
[14328] * Listening on ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none
[14328] Use Ctrl-C to stop
[14328] * Starting control server on http://127.0.0.1:40010
[14328] - Worker 1 (PID: 14333) booted in 0.0s, phase: 0
[14328] - Worker 0 (PID: 14332) booted in 0.01s, phase: 0
warm-up
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: a10' https://[::1]:40001 | 13918 requests in 1.10s, 159.46MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: c10' https://[::1]:40001 | 6163 requests in 1.10s, 71.10MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: s10' https://[::1]:40001 | 22974 requests in 1.10s, 263.20MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: i10' https://[::1]:40001 | 18741 requests in 1.10s, 214.71MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1' https://[::1]:40001 | 204979 requests in 10.10s, 546.71MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1' https://[::1]:40001 | 181543 requests in 10.10s, 487.25MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1' https://[::1]:40001 | 230961 requests in 10.10s, 615.87MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1' https://[::1]:40001 | 185924 requests in 10.10s, 495.81MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a10' https://[::1]:40001 | 112923 requests in 10.10s, 1.26GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c10' https://[::1]:40001 | 41905 requests in 10.10s, 483.31MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s10' https://[::1]:40001 | 195247 requests in 10.10s, 2.18GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i10' https://[::1]:40001 | 167796 requests in 10.10s, 1.88GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a100' https://[::1]:40001 | 10461 requests in 10.10s, 1.02GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c100' https://[::1]:40001 | 2371 requests in 10.03s, 237.69MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s100' https://[::1]:40001 | 88165 requests in 10.10s, 8.55GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i100' https://[::1]:40001 | 78638 requests in 10.10s, 7.63GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a256' https://[::1]:40001 | 9139 requests in 10.10s, 2.25GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c256' https://[::1]:40001 | 2743 requests in 10.10s, 696.48MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s256' https://[::1]:40001 | 46207 requests in 10.10s, 11.36GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i256' https://[::1]:40001 | 42425 requests in 10.10s, 10.43GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a512' https://[::1]:40001 | 4351 requests in 10.10s, 2.13GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c512' https://[::1]:40001 | 1509 requests in 10.10s, 764.44MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s512' https://[::1]:40001 | 27568 requests in 10.10s, 13.51GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i512' https://[::1]:40001 | 23491 requests in 10.10s, 11.51GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1024' https://[::1]:40001 | 1733 requests in 10.10s, 1.70GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1024' https://[::1]:40001 | 298 requests in 10.10s, 304.86MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1024' https://[::1]:40001 | 14428 requests in 10.10s, 14.12GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1024' https://[::1]:40001 | 12739 requests in 10.10s, 12.47GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a2048' https://[::1]:40001 | 473 requests in 10.04s, 0.94GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c2048' https://[::1]:40001 | 158 requests in 10.10s, 325.42MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s2048' https://[::1]:40001 | 7668 requests in 10.10s, 15.00GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i2048' https://[::1]:40001 | 6388 requests in 10.10s, 12.50GB read
Type req/sec 50% 75% 90% 99% 100% Resp Size Errors
─────────────────────────────────────────────────────────────────────────────────── 1kB
array 20293 0.4 0.5 0.8 3.4 11.0 1023
chunk 17969 0.6 1.1 2.8 5.4 11.6 1040
string 22868 0.5 1.0 2.4 4.3 13.2 1022
io 18407 0.6 0.9 3.2 6.0 15.2 1022
─────────────────────────────────────────────────────────────────────────────────── 10kB
array 11181 0.7 1.0 1.3 4.4 15.8 10239
chunk 4149 2.2 7.1 19.5 56.0 96.4 10319
string 19329 0.4 0.6 1.2 4.3 12.1 10238
io 16614 0.7 1.1 3.2 5.9 15.6 10239
─────────────────────────────────────────────────────────────────────────────────── 100kB
array 1036 8.9 23.5 72.0 214.2 313.9 102444
chunk 236 69.9 100.9 275.5 496.6 638.2 103344
string 8728 1.0 1.4 2.5 9.4 21.4 102400
io 7782 1.4 2.2 8.0 14.6 28.2 102400
─────────────────────────────────────────────────────────────────────────────────── 256kB
array 905 7.7 15.5 25.4 51.5 286.1 262277
chunk 272 21.6 64.8 103.7 337.2 1050.0 264470
string 4574 1.9 2.4 2.9 7.2 28.5 262109
io 4199 2.4 3.1 9.0 20.6 32.4 262148
─────────────────────────────────────────────────────────────────────────────────── 512kB
array 431 15.2 37.5 59.6 155.7 473.8 524888
chunk 149 41.2 110.0 174.3 365.3 1010.0 529420
string 2729 3.6 4.3 7.1 36.7 47.4 524522
io 2326 4.0 4.7 5.4 12.3 54.6 524349
─────────────────────────────────────────────────────────────────────────────────── 1024kB
array 172 35.5 100.7 140.7 869.0 1710.0 1051747
chunk 30 103.3 708.9 799.9 913.7 1470.0 1070940
string 1428 6.5 7.9 9.6 26.2 67.5 1049204
io 1261 7.6 8.5 9.5 18.6 73.1 1048938
─────────────────────────────────────────────────────────────────────────────────── 2048kB
array 47 155.8 425.9 523.4 1080.0 1580.0 2121956
chunk 16 646.0 806.9 888.2 1110.0 1160.0 2157877
string 759 12.7 14.6 16.6 31.6 140.6 2098010
io 632 15.2 17.0 20.4 36.0 173.2 2099701
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 20293 17969 22868 18407 0.36 0.60 0.53 0.57
10 11181 4149 19329 16614 0.74 2.19 0.42 0.65
100 1036 236 8728 7782 8.91 69.89 1.03 1.37
256 905 272 4574 4199 7.71 21.60 1.89 2.37
512 431 149 2729 2326 15.20 41.19 3.55 4.01
1024 172 30 1428 1261 35.52 103.26 6.54 7.58
2048 47 16 759 632 155.84 646.02 12.67 15.15
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch master
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]
[14328] - Gracefully shutting down workers...
[14328] === puma shutdown: 2022-09-13 20:58:46 +0900 ===
[14328] - Goodbye!
5:14 Total Time
This PR - c32f8ff9b
$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 23522 24581 25445 21879 0.405 0.424 0.455 0.537
10 22458 21593 22811 19156 0.472 0.430 0.487 0.568
100 7772 7431 8768 6853 1.100 1.330 1.230 1.210
256 4089 3646 4731 4037 2.150 2.410 2.050 2.210
512 2303 2021 2474 2244 3.960 4.560 3.730 4.100
1024 1170 1075 1442 1262 7.930 8.840 6.510 7.570
2048 565 538 755 656 16.640 17.940 12.770 14.790
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch 00-response-refactor
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]
View Full Result
$ benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
bundle exec bin/puma -q -b ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none -S tmp/bench_test_puma.state -w2 -t5:5 --control-url=tcp://127.0.0.1:40010 --control-token=test test/rackup/ci_select.ru
usage: sleep seconds
ruby: warning: Ruby was built without YJIT support
ruby: warning: Ruby was built without YJIT support
[13614] Puma starting in cluster mode...
[13614] * Puma version: 5.6.5 (ruby 3.2.0-p-1) ("Birdie's Version")
[13614] * Min threads: 5
[13614] * Max threads: 5
[13614] * Environment: development
[13614] * Master PID: 13614
[13614] * Workers: 2
[13614] * Restarts: (✔) hot (✔) phased
[13614] * Listening on ssl://[::1]:40001?cert=examples/puma/cert_puma.pem&key=examples/puma/puma_keypair.pem&verify_mode=none
[13614] Use Ctrl-C to stop
[13614] * Starting control server on http://127.0.0.1:40010
[13614] - Worker 0 (PID: 13628) booted in 0.01s, phase: 0
[13614] - Worker 1 (PID: 13629) booted in 0.01s, phase: 0
warm-up
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: a10' https://[::1]:40001 | 26562 requests in 1.10s, 304.31MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: c10' https://[::1]:40001 | 26710 requests in 1.10s, 308.04MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: s10' https://[::1]:40001 | 26873 requests in 1.10s, 307.87MB read
wrk -t2 -c4 -d1s --latency -H 'Body-Conf: i10' https://[::1]:40001 | 21684 requests in 1.10s, 248.42MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1' https://[::1]:40001 | 237588 requests in 10.10s, 633.59MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1' https://[::1]:40001 | 248277 requests in 10.10s, 666.33MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1' https://[::1]:40001 | 257024 requests in 10.10s, 685.37MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1' https://[::1]:40001 | 220993 requests in 10.10s, 589.28MB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a10' https://[::1]:40001 | 226841 requests in 10.10s, 2.54GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c10' https://[::1]:40001 | 218111 requests in 10.10s, 2.46GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s10' https://[::1]:40001 | 230415 requests in 10.10s, 2.58GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i10' https://[::1]:40001 | 193550 requests in 10.10s, 2.17GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a100' https://[::1]:40001 | 78517 requests in 10.10s, 7.62GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c100' https://[::1]:40001 | 75054 requests in 10.10s, 7.33GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s100' https://[::1]:40001 | 88591 requests in 10.10s, 8.60GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i100' https://[::1]:40001 | 69228 requests in 10.10s, 6.72GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a256' https://[::1]:40001 | 41302 requests in 10.10s, 10.15GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c256' https://[::1]:40001 | 36824 requests in 10.10s, 9.11GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s256' https://[::1]:40001 | 47792 requests in 10.10s, 11.75GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i256' https://[::1]:40001 | 40768 requests in 10.10s, 10.02GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a512' https://[::1]:40001 | 23257 requests in 10.10s, 11.40GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c512' https://[::1]:40001 | 20412 requests in 10.10s, 10.07GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s512' https://[::1]:40001 | 24992 requests in 10.10s, 12.25GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i512' https://[::1]:40001 | 22664 requests in 10.10s, 11.11GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a1024' https://[::1]:40001 | 11815 requests in 10.10s, 11.56GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c1024' https://[::1]:40001 | 10864 requests in 10.10s, 10.70GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s1024' https://[::1]:40001 | 14565 requests in 10.10s, 14.25GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i1024' https://[::1]:40001 | 12749 requests in 10.10s, 12.47GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: a2048' https://[::1]:40001 | 5656 requests in 10.01s, 11.06GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: c2048' https://[::1]:40001 | 5433 requests in 10.10s, 10.70GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: s2048' https://[::1]:40001 | 7628 requests in 10.10s, 14.92GB read
wrk -t8 -c16 -d10s --latency -H 'Body-Conf: i2048' https://[::1]:40001 | 6625 requests in 10.10s, 12.96GB read
Type req/sec 50% 75% 90% 99% 100% Resp Size
───────────────────────────────────────────────────────────────── 1kB
array 23522 0.40 0.63 1.66 3.50 10.67 1022
chunk 24581 0.42 0.66 2.27 4.34 11.19 1040
string 25445 0.46 0.76 2.06 3.75 13.53 1022
io 21879 0.54 0.95 3.15 5.15 13.91 1022
───────────────────────────────────────────────────────────────── 10kB
array 22458 0.47 0.74 2.02 3.77 11.16 10238
chunk 21593 0.43 0.65 1.69 4.07 16.12 10318
string 22811 0.49 0.79 2.20 4.11 11.73 10238
io 19156 0.57 0.96 2.70 5.07 16.60 10238
───────────────────────────────────────────────────────────────── 100kB
array 7772 1.10 1.50 2.18 11.60 28.77 102403
chunk 7431 1.33 1.89 5.60 12.58 22.79 103110
string 8768 1.23 2.04 5.79 10.36 18.90 102445
io 6853 1.21 1.58 2.02 6.84 26.84 102406
───────────────────────────────────────────────────────────────── 256kB
array 4089 2.15 2.67 3.22 6.04 32.39 262218
chunk 3646 2.41 3.00 3.61 7.66 30.20 263806
string 4731 2.05 2.65 6.64 21.66 34.47 262144
io 4037 2.21 2.69 3.22 7.24 30.38 262079
───────────────────────────────────────────────────────────────── 512kB
array 2303 3.96 4.75 5.62 11.08 57.11 524597
chunk 2021 4.56 5.54 6.55 11.35 57.56 528084
string 2474 3.73 4.57 5.40 9.28 42.29 524419
io 2244 4.10 4.88 5.85 10.87 61.56 524574
───────────────────────────────────────────────────────────────── 1024kB
array 1170 7.93 9.37 11.32 21.85 95.88 1048425
chunk 1075 8.84 10.37 12.14 22.93 95.74 1055826
string 1442 6.51 7.65 8.91 21.50 60.49 1048690
io 1262 7.57 8.53 9.59 24.34 94.99 1048879
───────────────────────────────────────────────────────────────── 2048kB
array 565 16.64 20.12 24.35 38.16 176.99 2098685
chunk 538 17.94 21.14 24.58 35.24 184.88 2113056
string 755 12.77 14.66 16.75 32.02 155.45 2098783
io 656 14.79 16.34 18.30 37.37 188.06 2098245
Body ────────── req/sec ────────── ─────── req 50% times ───────
KB array chunk string io array chunk string io
1 23522 24581 25445 21879 0.405 0.424 0.455 0.537
10 22458 21593 22811 19156 0.472 0.430 0.487 0.568
100 7772 7431 8768 6853 1.100 1.330 1.230 1.210
256 4089 3646 4731 4037 2.150 2.410 2.050 2.210
512 2303 2021 2474 2244 3.960 4.560 3.730 4.100
1024 1170 1075 1442 1262 7.930 8.840 6.510 7.570
2048 565 538 755 656 16.640 17.940 12.770 14.790
─────────────────────────────────────────────────────────────────────
wrk -t8 -c16 -d10s
benchmarks/local/response_time_wrk.sh -w2 -t5:5 -s ssl6 -Y
Server cluster mode -w2 -t5:5, bind: ssl6
Puma repo branch 00-response-refactor
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [arm64-darwin21]
[13614] - Gracefully shutting down workers...
[13614] === puma shutdown: 2022-09-13 20:50:47 +0900 ===
[13614] - Goodbye!
5:14 Total Time
@JuanitoFatas Thanks for the data.
I updated the notes in the first post, and mentioned two possible issues, correct client (socket) error handling, and possible issues if Rack::Sendfile middleware is used.
Otherwise, I think it's time to own it and merge.