japronto icon indicating copy to clipboard operation
japronto copied to clipboard

Some performance clarifications in this mass hysteria

Open ghost opened this issue 8 years ago • 12 comments

Hi,

Nice work, definitely in the same category as the work I do myself (I try and make Node.js a little faster, just like you try and make Python a little faster). We both use a native core and wrap it up so I wanted to see how these benchmarks held up compared:

These are not pipelined, and let me explain why. If I do a pipelined benchmark I too can easily get to at least 1 million requests/second (I have gotten to 5 million echoes/second with websockets and http should be similar in perf). As we all know benchmarking your own solutions is very susceptible to confirmation bias and you can easily get carried away and over exaggerate the results.

The problem with http pipelining is that you can only get a better performance if the responses are all perfectly sorted in the order of their respective requests arrival. This makes it very unlikely to get close to the performance of a perfect http pipelining benchmark. In most cases, http pipelining will actually harm performance since your responses will need to block and buffer up in memory until they can be written to network in a sorted manner if you respond async. That's why wrk doesn't have pipelining by default in its benchmarks, but instead sends one request and waits for the response before sending another one. It simply becomes too fabricated to include http pipelining performance in a http benchmark.

ghost avatar Feb 02 '17 06:02 ghost

Hi thanks for the insights, many people asked for non-pipelined results

http pipelining will actually harm performance since your responses will need to block and buffer up in memory until they can be written to network in a sorted manner if you respond async.

I feel I made it clear in the article that pipelining doesnt always help and you have to be careful when you use it but I still claim that even if it's arguably useless with web browsers it can be used in controlled environments like chatty web services over internal network when you could have an idea of request completion time and speculatively order GET requests. I think working on heuristics and collecting statistical data to learn from on the go (just like databases do when planning queries) is a way to do it correctly. I don't do that yet and I'm not saying it's easy but I feel like it's viable.

I'm gonna look into µWS for a source of inspiration. I might have worded things more carefully before publishing I admit, in the next round of benchmarks I am gonna focus more on non-pipelined results.

squeaky-pl avatar Feb 02 '17 09:02 squeaky-pl

Testing with pipelining mitigates the impact of the overhead the testing tool introduces + the network stack (even if it's loopback), by essentially spreading out those costs over the totality of requests. In my opinion, it better isolates the test to the throughput of the framework.

However, what doesn't reflect a real-world usage is pipelining over a single, or very limited, set of users / concurrent connections.

A trade-off would be to test pipeline batches of e.g. 100 requests, which reflects more real world usage and also exercises the overhead for connection lifecycle in the framework.

raulk avatar Feb 02 '17 09:02 raulk

I was running the benchmarks on my local machine. Machine Configuration: Processor Name: Intel Core i7 Processor Speed: 2.8 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 6 MB Memory: 16 GB

The benchmarks as follows:

Japronto 82102 req/sec >$ ./wrk -t12 -c400 -d30s http://localhost:8080 Running 30s test @ http://localhost:8080 12 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.90ms 369.01us 16.70ms 92.20% Req/Sec 6.88k 2.94k 12.33k 57.24% 2471446 requests in 30.10s, 216.84MB read Socket errors: connect 157, read 147, write 0, timeout 0 Requests/sec: 82102.64 Transfer/sec: 7.20MB

Sanic 33791 req/sec >$ ./wrk -t12 -c400 -d30s http://localhost:8080 Running 30s test @ http://localhost:8080 12 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 7.03ms 2.13ms 24.85ms 66.20% Req/Sec 2.83k 1.27k 5.66k 72.03% 1016182 requests in 30.07s, 135.67MB read Socket errors: connect 157, read 147, write 0, timeout 0 Requests/sec: 33791.95 Transfer/sec: 4.51MB

Tornado 3026 req/sec >$ ./wrk -t12 -c400 -d30s http://localhost:8080 Running 30s test @ http://localhost:8080 12 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 77.70ms 6.06ms 321.62ms 93.20% Req/Sec 254.04 130.97 490.00 58.46% 91100 requests in 30.10s, 7.99MB read Socket errors: connect 157, read 193, write 0, timeout 0 Requests/sec: 3026.97 Transfer/sec: 271.95KB

Nodejs 30889 req/sec >$ ./wrk -t12 -c400 -d30s http://localhost:8080 Running 30s test @ http://localhost:8080 12 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 7.69ms 1.24ms 57.78ms 95.64% Req/Sec 2.59k 1.26k 15.56k 58.75% 929836 requests in 30.10s, 103.75MB read Socket errors: connect 157, read 147, write 0, timeout 0 Requests/sec: 30889.09 Transfer/sec: 3.45MB

Meinheld 53368 req/sec >$ ./wrk -t12 -c400 -d30s http://localhost:8080 Running 30s test @ http://localhost:8080 12 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 4.48ms 0.87ms 47.32ms 94.69% Req/Sec 4.48k 2.09k 25.45k 67.30% 1606445 requests in 30.10s, 271.17MB read Socket errors: connect 157, read 99, write 0, timeout 0 Requests/sec: 53368.97 Transfer/sec: 9.01MB

Golang 28374 req/sec >$ ./wrk -t12 -c400 -d30s http://localhost:8080 Running 30s test @ http://localhost:8080 12 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 4.16ms 1.02ms 154.74ms 90.04% Req/Sec 4.79k 2.33k 9.07k 56.26% 853160 requests in 30.07s, 104.96MB read Socket errors: connect 157, read 2317, write 2, timeout 0 Requests/sec: 28374.96 Transfer/sec: 3.49MB

Framework | req/sec

  1. Japronto | 82102
  2. Meinheld | 53368
  3. Sanic | 33791
  4. Nodejs | 30889
  5. Golang | 28374
  6. Tornado | 03026

Japronto is very fast as it is advertised. Looking forward to see for more features like Tornado or Nodejs webservers.

chaitanya9186 avatar Feb 06 '17 21:02 chaitanya9186

@chaitanya9186 Thanks for taking time to benchmark all the contestants without pipeling, grately appreciated.

squeaky-pl avatar Feb 06 '17 21:02 squeaky-pl

This is really very impressive! What happens if the route takes 200ms with, say a blocking db call and some data munging.

aphillipo avatar Feb 06 '17 22:02 aphillipo

hey @chaitanya9186 could you please share the code you used to make the benchmark?

felipevolpone avatar Feb 18 '17 12:02 felipevolpone

@felipevolpone I think he used code from benchmarks directory in the repo

squeaky-pl avatar Feb 18 '17 13:02 squeaky-pl

Benchmark image link from @alexhultman is down. Would be great to show this data on the main github page for those worried it was an unfair comparison due to pipelining.

gwicksted avatar Mar 29 '17 16:03 gwicksted

@gwicksted That's not how marketing works. Japronto has no benefit from showing µWS on its front page and will not do so (even less so given that I'm at 3-6 million req/sec in my development branch now :wink:). However, the results of Japronto are bound by the Python scripting VM so it's not a fair comparison -> my JavaScript wrapper is completely clogging up at these performances.

ghost avatar Apr 19 '17 13:04 ghost

@alexhultman I agree 100%! Sorry for the confusion. I think the non-pipelining numbers are still quite impressive and never meant to imply displaying µWS on the front page but rather two separate items: 1) restore the image in this ticket for reference and 2) display the data from @chaitanya9186 as a graph on the front page.

gwicksted avatar Apr 19 '17 14:04 gwicksted

What is interesting, and probably more fair is to maybe stop using Japronto's own pipeline.lua (which is a little bit over the top) and instead use the more toned down pipeline.lua of upstream wrk. Japronto's pipeline.lua has something like 24 pipelined GET / while the upstream pipeline.lua has only 3. For me, I get pretty much linear gain in performance the more GET / I put in pipeline.lua. With the upstream one, I get 800k and with no pipelining I get 320-350k, so there are very significant differences the more GET / you put in there.

Edit: Turned out I actually had the image still.

ghost avatar Apr 19 '17 14:04 ghost

Some more web framework benchmarks to read

https://github.com/the-benchmarker/web-frameworks

eirnym avatar Apr 12 '19 11:04 eirnym