Hyperfoil icon indicating copy to clipboard operation
Hyperfoil copied to clipboard

Pipelining discrepancies compared with wrk

Open galderz opened this issue 1 year ago • 1 comments

When running with pipelining, I see big throughput differences between wrk and a yaml that replicates wrk.

With wrk:

/home/g/opt/wrk/wrk --latency -d 15 -c 256 --timeout 8 -t 3 http://192.168.1.163:8080/hello -s ../pipeline.lua -- 16
Running 15s test @ http://192.168.1.163:8080/hello
  3 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    28.47ms   15.99ms 127.90ms   64.22%
    Req/Sec    27.17k     1.55k   35.68k    77.33%
  Latency Distribution
     50%   27.40ms
     75%   39.53ms
     90%   50.57ms
     99%   64.21ms
  1216848 requests in 15.05s, 96.32MB read
Requests/sec:  80877.09
Transfer/sec:      6.40MB

With a homegrown yaml based on the yaml in the documentation:

[hyperfoil@in-vm]$ run plaintext-wrk -PSERVER=192.168.1.163 -PDURATION=15 -PTHREADS=3 -PTIMEOUT=8 -PPIPELINE=16 -PCONNECTIONS=256
Started run 0003
Run 0003, benchmark plaintext-wrk
Agents: in-vm[STOPPED]
Started: 2023/12/04 10:38:55.427    Terminated: 2023/12/04 10:39:16.441
NAME         STATUS      STARTED       REMAINING  COMPLETED     TOTAL DURATION               DESCRIPTION
calibration  TERMINATED  10:38:55.427             10:39:01.434  6007 ms (exceeded by 7 ms)   256 users always
test         TERMINATED  10:39:01.434             10:39:16.441  15007 ms (exceeded by 7 ms)  256 users always
[hyperfoil@in-vm]$ stats
Total stats from run 0003
PHASE        METRIC   THROUGHPUT    REQUESTS  MEAN     p50      p90       p99       p99.9     p99.99    TIMEOUTS  ERRORS  BLOCKED  2xx     3xx  4xx  5xx  CACHE
calibration  request  46.00k req/s    276326  5.51 ms  4.88 ms  11.21 ms  13.43 ms  16.25 ms  21.63 ms         0       0     0 ns  276326    0    0    0      0
test         request  46.09k req/s    691686  5.50 ms  4.85 ms  11.21 ms  13.30 ms  15.53 ms  19.79 ms         0       0     0 ns  691686    0    0    0      0

The yaml looks like this:

name: plaintext-wrk
threads: !param THREADS 2 # option -t
http:
  host: !concat [ "http://", !param SERVER localhost, ":8080" ]
  allowHttp2: false
  pipeliningLimit: !param PIPELINE 1
  sharedConnections: !param CONNECTIONS 10 # option -c
ergonomics:
  repeatCookies: false
  userAgentFromSession: false
phases:
  - calibration:
      always:
        users: !param CONNECTIONS 10 # option -c
        duration: 6s
        maxDuration: 70s # This is duration + default timeout 60s
        scenario: &scenario
          - request:
              - httpRequest:
                  GET: /hello
                  timeout: !param TIMEOUT 60s # option --timeout
                  headers:
                    - accept: text/plain # option -H
                  handler:
                    # We'll check that the response was successful (status 200-299)
                    status:
                      range: 2xx
  - test:
      always:
        users: !param CONNECTIONS 10 # option -c
        duration: !param DURATION 10s # option -d
        maxDuration: 70s # This is duration + default timeout 60s
        startAfterStrict: calibration
        scenario: *scenario

I can't compare with the wrk in Hyperfoil because it doesn't support pipelining.

Without pipelining the difference is not there any more. wrk results are:

/home/g/opt/wrk/wrk --latency -d 15 -c 256 --timeout 8 -t 3 http://192.168.1.163:8080/hello
Running 15s test @ http://192.168.1.163:8080/hello
  3 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.14ms    6.15ms  37.64ms   78.02%
    Req/Sec    14.40k     1.02k   17.89k    62.00%
  Latency Distribution
     50%    2.92ms
     75%   10.93ms
     90%   17.58ms
     99%   23.17ms
  644537 requests in 15.04s, 51.02MB read
Requests/sec:  42850.63
Transfer/sec:      3.39MB

And with the yaml above:

[hyperfoil@in-vm]$ run plaintext-wrk -PSERVER=192.168.1.163 -PDURATION=15 -PTHREADS=3 -PTIMEOUT=8 -PPIPELINE=1 -PCONNECTIONS=256
Started run 0002
Run 0002, benchmark plaintext-wrk
Agents: in-vm[STOPPED]
Started: 2023/12/04 10:36:37.261    Terminated: 2023/12/04 10:36:58.277
NAME         STATUS      STARTED       REMAINING  COMPLETED     TOTAL DURATION               DESCRIPTION
calibration  TERMINATED  10:36:37.261             10:36:43.269  6008 ms (exceeded by 8 ms)   256 users always
test         TERMINATED  10:36:43.269             10:36:58.277  15008 ms (exceeded by 8 ms)  256 users always
[hyperfoil@in-vm]$ stats
Total stats from run 0002
PHASE        METRIC   THROUGHPUT    REQUESTS  MEAN     p50      p90       p99       p99.9     p99.99    TIMEOUTS  ERRORS  BLOCKED  2xx     3xx  4xx  5xx  CACHE
calibration  request  39.90k req/s    239699  6.20 ms  2.82 ms  18.74 ms  24.12 ms  28.05 ms  30.80 ms         0       0     0 ns  239699    0    0    0      0
test         request  39.88k req/s    598583  6.39 ms  2.83 ms  19.66 ms  23.86 ms  27.26 ms  32.51 ms         0       0     0 ns  598583    0    0    0      0  

galderz avatar Dec 04 '23 09:12 galderz

Despite not being a bug per se, this is due to how hyperfoil implements pipelining - it is indeed able to sent more than one HTTP requests over the same connection, without waiting the completion(s) - but it is sending them one by one without batching i.e https://github.com/Hyperfoil/Hyperfoil/blob/80c2bfc2f483ee2baf53ea26cbbe830305310462/http/src/main/java/io/hyperfoil/http/connection/Http1xConnection.java#L194-L199

Instead, it should perform the writes (without flushing) and flush them when a full batch is completed: the problem is that is not simple to be achieved.

For example if we simply implement a watermark-based mechanism on the connections which knows when the max pipelining level has been reached, if we have not enough available sessions for the "last" request, we loose the chance to flush all the previous ones in the batch. And we cannot even proceed in that, because no new sessions will be made available till the response for them is not received; leading to a starvation problem due to a circular dependency.

Even more naively: if the rate is 1 request/sec, we cannot just use a fixed watermark overflow before flushing a batch, or we risk to wait N (=== pipelining level) seconds before flushing the whole batch.

Doing this right requires some further investigation, and TBH I'm not sure is it worthy our time, unless it's a trivial change: mostly because HTTP 1.1 pipelining, beside benchmarks is not really a thing.

BUT...if HTTP 2 behave the same way,in hyperfoil and by measuring it we observe that it requires to be fixed, we can use this chance to unify the approaches, and fix this one too.

franz1981 avatar Jul 10 '24 08:07 franz1981