hyper icon indicating copy to clipboard operation
hyper copied to clipboard

Current thread performance http1.1

Open littledivy opened this issue 2 years ago • 3 comments

Hi, below are results from a "Hello World" micro-benchmark comparing hyper 1.0 (52f192593fb9ebcf6d3894e0c85cbf710da4decd) and uWebSocket both running on a current_thread event loop.

# uWebSocket
divy@mini ~> wrk -d 10s --latency http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    42.75us   69.88us   5.83ms   99.54%
    Req/Sec   109.29k     6.66k  121.53k    92.08%
  Latency Distribution
     50%   38.00us
     75%   50.00us
     90%   61.00us
     99%   89.00us
  2197037 requests in 10.10s, 142.48MB read
Requests/sec: 217523.38
Transfer/sec:     14.11MB

# Hyper
divy@mini ~> wrk -d 10s --latency http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    90.14us  608.17us  21.18ms   99.25%
    Req/Sec    86.52k     9.89k  102.72k    85.15%
  Latency Distribution
     50%   49.00us
     75%   60.00us
     90%   73.00us
     99%  246.00us
  1738450 requests in 10.10s, 145.90MB read
Requests/sec: 172106.55
Transfer/sec:     14.44MB

This does not represent real-world performance but also indicates some overhead in the Hyper server machinary.

Profile for the hyper run: https://share.firefox.dev/3NqxfKC Profile for the uWebSocket run: https://share.firefox.dev/3pppj4p

I noticed that due to the API, Request is owned and headers have to be copied over to a HeaderMap and deallocated after every request. This can be seen in the above flamegraphs too. Do you think its possible to improve this?

littledivy avatar Jun 29 '23 12:06 littledivy

Thanks for taking a look into hyper performance! Trying to go faster is always appreciated.

I think it's fair in this case to point out that hyper does more than uWebSockets to make sure the HTTP connection/state is correct and safe. I would never want to sacrifice that.

But let's see what does exist:

  • HeaderName and HeaderValue are owned, they use Bytes internally. hyper uses a single BytesMut for reading, and then slices it into Bytes (which share the same buffer, with new indices and a refcount). And as long as all the Bytes are dropped by the time it tries to read a new request, it can reuse the same allocation.
  • I do see some Bytes::copy_from_slice calls in the profile, it may be worth tracking down why.
  • The HeaderMap does use an allocation to store a dynamic map. I'm open to improvements to reduce the cost of reserving/inserting (as long as it doesn't change the public API).

There's some benchmarks in hyper (both in the benches directory, checking integration performance, and in src/proto/h1/role.rs that are specific function benchmarks).

seanmonstar avatar Jul 06 '23 14:07 seanmonstar

Just want to point out the response is likely being different in size. uWebSocket: 2197037 requests in 10.10s, 142.48MB read hyper: 1738450 requests in 10.10s, 145.90MB read

fakeshadow avatar Jul 07 '23 03:07 fakeshadow

Good point. Date header is missing from uWebSocket.

uWebSocket:

> GET / HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.84.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< uWebSockets: 20
< Content-Length: 12
<
* Connection #0 to host 127.0.0.1 left intact
Hello world!⏎ 

Hyper:

> GET / HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.84.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 12
< date: Fri, 07 Jul 2023 03:40:47 GMT
<
* Connection #0 to host 127.0.0.1 left intact
Hello World!⏎ 

littledivy avatar Jul 07 '23 03:07 littledivy