hyper
hyper copied to clipboard
Current thread performance http1.1
Hi, below are results from a "Hello World" micro-benchmark comparing hyper 1.0 (52f192593fb9ebcf6d3894e0c85cbf710da4decd) and uWebSocket both running on a current_thread event loop.
# uWebSocket
divy@mini ~> wrk -d 10s --latency http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 42.75us 69.88us 5.83ms 99.54%
Req/Sec 109.29k 6.66k 121.53k 92.08%
Latency Distribution
50% 38.00us
75% 50.00us
90% 61.00us
99% 89.00us
2197037 requests in 10.10s, 142.48MB read
Requests/sec: 217523.38
Transfer/sec: 14.11MB
# Hyper
divy@mini ~> wrk -d 10s --latency http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 90.14us 608.17us 21.18ms 99.25%
Req/Sec 86.52k 9.89k 102.72k 85.15%
Latency Distribution
50% 49.00us
75% 60.00us
90% 73.00us
99% 246.00us
1738450 requests in 10.10s, 145.90MB read
Requests/sec: 172106.55
Transfer/sec: 14.44MB
This does not represent real-world performance but also indicates some overhead in the Hyper server machinary.
Profile for the hyper run: https://share.firefox.dev/3NqxfKC Profile for the uWebSocket run: https://share.firefox.dev/3pppj4p
I noticed that due to the API, Request is owned and headers have to be copied over to a HeaderMap and deallocated after every request. This can be seen in the above flamegraphs too. Do you think its possible to improve this?
Thanks for taking a look into hyper performance! Trying to go faster is always appreciated.
I think it's fair in this case to point out that hyper does more than uWebSockets to make sure the HTTP connection/state is correct and safe. I would never want to sacrifice that.
But let's see what does exist:
HeaderNameandHeaderValueare owned, they useBytesinternally. hyper uses a singleBytesMutfor reading, and then slices it intoBytes(which share the same buffer, with new indices and a refcount). And as long as all theBytesare dropped by the time it tries to read a new request, it can reuse the same allocation.- I do see some
Bytes::copy_from_slicecalls in the profile, it may be worth tracking down why. - The
HeaderMapdoes use an allocation to store a dynamic map. I'm open to improvements to reduce the cost of reserving/inserting (as long as it doesn't change the public API).
There's some benchmarks in hyper (both in the benches directory, checking integration performance, and in src/proto/h1/role.rs that are specific function benchmarks).
Just want to point out the response is likely being different in size.
uWebSocket: 2197037 requests in 10.10s, 142.48MB read
hyper: 1738450 requests in 10.10s, 145.90MB read
Good point. Date header is missing from uWebSocket.
uWebSocket:
> GET / HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.84.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< uWebSockets: 20
< Content-Length: 12
<
* Connection #0 to host 127.0.0.1 left intact
Hello world!⏎
Hyper:
> GET / HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.84.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 12
< date: Fri, 07 Jul 2023 03:40:47 GMT
<
* Connection #0 to host 127.0.0.1 left intact
Hello World!⏎