tempesta
tempesta copied to clipboard
Multiple performance regressions
Make sure that during the verification tests the user space servers run on vanilla kernel (our patch still implies changes even with undefined Tempesta).
Test configuration
Benchmark results on CPU i9-12900HK (0-11 are performance cores) for the current master d7294b86a563bd18d5af51f701821357ade089de (the same as release-0.7 as of Jul 13) vs Nginx 1.18.0 build with OpenSSL 3.0.2. Tempesta FW and Nginx are running in a VM with 2 vCPUs bound to performance cores 0 and 2. All the benchmark tools were running from the host machine. The VM uses multiqueue virtio-net.
Tempesta config:
listen 192.168.100.4:443 proto=https;
listen 192.168.100.4:80;
frang_limits {
http_methods GET HEAD;
http_uri_len 1000;
}
block_action attack reply;
block_action error reply;
srv_group default {
server 192.168.100.4:8000;
}
vhost tempesta-tech.com {
tls_certificate /root/tempesta/etc/tfw-root.crt;
tls_certificate_key /root/tempesta/etc/tfw-root.key;
tls_tickets secret="f00)9eR59*_/22" lifetime=7200;
resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";
proxy_pass default;
}
cache 1;
cache_fulfill * *;
http_chain redirection_chain {
uri == "/blog" -> 301 = /blog/1;
uri == "/blog/" -> 301 = /blog/1;
uri == "/services" -> 301 = /development-services;
uri == "/services.html" -> 301 = /development-services;
uri == "/c++-services" -> 301 = /development-services;
uri == "/index.html" -> 301 = /index;
uri == "/company.html" -> 301 = /company;
uri == "/blog/fast-programming-languages-c-c++-rust-assembly" -> 301 = /blog/fast-programming-languages-c-cpp-rust-assembly;
-> tempesta-tech.com;
}
http_chain {
host == "tempesta-tech.com" -> redirection_chain;
}
Nginx config servicing /var/www/tempesta-tech.com
with 26KB index file or /var/www/html
with a 10 bytes index:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
events {
worker_connections 65535;
multi_accept on;
use epoll;
}
worker_rlimit_nofile 65535;
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 100000;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type text/html;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
ssl_certificate /root/tempesta/etc/tfw-root.crt;
ssl_certificate_key /root/tempesta/etc/tfw-root.key;
ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
ssl_prefer_server_ciphers on;
server {
listen 192.168.100.4:8000;
listen 192.168.100.4:8443 default_server ssl http2;
#root /var/www/tempesta-tech.com;
root /var/www/html;
index index;
server_name _;
add_header X-Crash-1377 ' ';
location = /blog { # FIXME: this URI still redirects to http://tempesta-tech.com:8000/blog/
return 301 https://tempesta-tech.com/blog/1;
}
location = /blog/ {
return 301 https://tempesta-tech.com/blog/1;
}
location ~ /services(|.html)$ {
return 301 https://tempesta-tech.com/development-services;
}
location ~ /c\+\+-services$ {
return 301 https://tempesta-tech.com/development-services;
}
location = /index.html {
return 301 https://tempesta-tech.com/index;
}
location ~ /blog/fast-programming-languages-c-c\+\+-rust-assembly$ {
return 301 https://tempesta-tech.com/blog/fast-programming-languages-c-cpp-rust-assembly;
}
error_page 403 404 /oops;
}
}
TLS regression
We saw the problem earlier on migration from the Linux kerne 4.14 to 5.10 https://github.com/tempesta-tech/tempesta/issues/1504#issuecomment-946040195 . Also see the tail latency problem in https://github.com/tempesta-tech/tempesta/issues/1434
$ taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 192.168.100.4 443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 10; HANDSHAKES 209609
HANDSHAKES/sec: MAX 22239; AVG 20959; 95P 12057; MIN 12057
LATENCY (ms): MIN 5; AVG 14; 95P 7; MAX 856
$ taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 192.168.100.4 8443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 10; HANDSHAKES 148105
HANDSHAKES/sec: MAX 15718; AVG 14808; 95P 10052; MIN 10052
LATENCY (ms): MIN 16; AVG 64; 95P 111; MAX 1724
$ taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 192.168.100.4 443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 10; HANDSHAKES 61622
HANDSHAKES/sec: MAX 7071; AVG 6160; 95P 4372; MIN 4372
LATENCY (ms): MIN 258; AVG 406; 95P 534; MAX 564
$ taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 192.168.100.4 8443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 9; HANDSHAKES 61392
HANDSHAKES/sec: MAX 7184; AVG 6496; 95P 5937; MIN 5937
LATENCY (ms): MIN 63; AVG 488; 95P 584; MAX 763
The reference could the FOSDEM'21 demo (results were pretty stable and we ran it on different machines):
- now abbreviated handshakes are less than 2 times faster than OpenSSL while previously we had more than 2 times.
- now we have slower full handshakes while previously we had almost 2 times faster. The difference is quite small and can be just a testing deviation (I just made one run), but we still should not see even equal numbers.
OpenSSL 3 also integrated Bernstein elliptic curve multiplication, so it's expected that it's faster than OpenSSL 1, which we tested previously. But there is still no reason why Tempesta TLS can be slower.
10 bytes cached response, no keep-alive
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 15.38ms 37.70ms 1.57s 97.00%
Req/Sec 2.99k 409.46 5.75k 71.40%
354473 requests in 30.10s, 128.12MB read
Requests/sec: 11776.08
Transfer/sec: 4.26MB
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 44.59ms 17.90ms 121.42ms 70.45%
Req/Sec 2.78k 288.37 4.36k 75.34%
329486 requests in 30.07s, 83.58MB read
Requests/sec: 10957.63
Transfer/sec: 2.78MB
Nginx is only negligibly slower. I'd suppose that wrk
uses abbreviated TLS handshakes, but I'm not sure and this should be verified.
10 bytes cached response, keep-alive
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 23.86ms 20.21ms 77.83ms 73.18%
Req/Sec 11.83k 6.28k 25.26k 78.99%
1404153 requests in 30.09s, 482.08MB read
Requests/sec: 46671.73
Transfer/sec: 16.02MB
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.62ms 4.77ms 112.47ms 77.13%
Req/Sec 19.98k 1.42k 22.05k 92.35%
2368764 requests in 30.05s, 612.20MB read
Requests/sec: 78831.90
Transfer/sec: 20.37MB
Nginx is almost 2 times faster than Tempesta.
26KB cached response, no keep-alive
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 25.35ms 39.93ms 1.58s 96.93%
Req/Sec 2.04k 331.83 4.76k 71.52%
242204 requests in 30.08s, 5.92GB read
Requests/sec: 8052.70
Transfer/sec: 201.53MB
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 51.40ms 19.75ms 273.29ms 70.91%
Req/Sec 2.51k 223.67 3.17k 75.76%
297959 requests in 30.08s, 7.25GB read
Requests/sec: 9906.66
Transfer/sec: 246.84MB
Nginx is still faster, but not so dramatically. IIRC on today demo we saw reversed results: Tempesta behaved better on the smaller file than on the large one.
26KB cached response, keep-alive
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com
Running 30s test @ https://tempesta-tech.com
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 87.61ms 108.18ms 1.94s 86.76%
Req/Sec 4.18k 564.97 8.59k 74.43%
495844 requests in 30.09s, 12.12GB read
Socket errors: connect 0, read 0, write 0, timeout 7
Requests/sec: 16477.60
Transfer/sec: 412.44MB
$ taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443
Running 30s test @ https://tempesta-tech.com:8443
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 24.38ms 2.69ms 123.94ms 84.14%
Req/Sec 10.29k 755.72 11.72k 79.12%
1219107 requests in 30.09s, 29.67GB read
Requests/sec: 40511.45
Transfer/sec: 0.99GB
Nginx shows about 2.5 better result.
Conclusions
- Having that I observed somewhat different results today on the same software versions, the same hardware and the same command, there are some testing methodology issues. In this case a best practice must be documented in the wiki
- I didn't profile any of the cases, but there could be networking issues on our side, e.g. extra copies.
- Another possible networking issue could be in TCP, e.g. unacknowledged segments or something like this - need to trace connections
TODO
Also need to test large body workload (VOD), about 5-10MB to test**
One of the options to debug TCP latency issues is to use SO_TIMESTAMPING, e.g. generate a massive workload and trace a single testing TCP stream with the timestamps. References:
- https://netdevconf.info/0x17/sessions/talk/so_timestamping-powering-fleetwide-rpc-monitoring.html
- https://www.youtube.com/watch?v=RjNEbTaqnX4 (the paper)
BPF_PROG_TYPE_SOCK_OPS also allows tracing of TCP issues with retransmissions and windows
HTTP/2 benchmark
Current master
as of 690cb94ab4b5e4379f41c0c7013f07ce0806610e with config
listen 192.168.100.4:443 proto=h2;
listen 192.168.100.4:80;
frang_limits {
client_header_timeout 20;
client_body_timeout 10;
http_header_chunk_cnt 10;
http_methods GET HEAD;
http_uri_len 1000;
http_resp_code_block 400 403 404 100 10;
}
# Allow only following characters in URI (no '%'): /a-zA-Z0-9&?:-._=
http_uri_brange 0x2f 0x41-0x5a 0x61-0x7a 0x30-0x39 0x26 0x3f 0x3a 0x2d 0x2e 0x5f 0x3d;
block_action attack reply;
block_action error reply;
srv_group default {
server 192.168.100.4:8000;
}
tls_match_any_server_name;
tls_certificate /root/tempesta/etc/tfw-root.crt;
tls_certificate_key /root/tempesta/etc/tfw-root.key;
vhost tempesta-tech.com {
tls_tickets secret="f00)9eR59*_/22" lifetime=7200;
resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";
proxy_pass default;
}
cache 1;
cache_fulfill * *;
#access_log on;
http_chain redirection_chain {
uri == "/blog" -> 301 = /blog/1;
uri == "/blog/" -> 301 = /blog/1;
uri == "/services" -> 301 = /development-services;
uri == "/services.html" -> 301 = /development-services;
uri == "/c++-services" -> 301 = /development-services;
uri == "/index.html" -> 301 = /index;
uri == "/company.html" -> 301 = /company;
uri == "/blog/fast-programming-languages-c-c++-rust-assembly" -> 301 = /blog/fast-programming-languages-c-cpp-rust-assembly;
-> tempesta-tech.com;
}
http_chain {
host == "tempesta-tech.com" -> redirection_chain;
}
and the old website index page (~26KB).
$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Server Temp Key: ECDH P-256 256 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
finished in 27.03s, 36736.66 req/s, 915.98MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.18GB (25966282770) total, 214.65MB (225080802) headers (space savings 24.36%), 23.92GB (25688290320) data
min max mean sd +/- sd
time for request: 33us 388.79ms 23.10ms 18.06ms 79.81%
time for connect: 137.49ms 274.73ms 217.24ms 29.47ms 69.22%
time to 1st byte: 201.56ms 307.52ms 239.35ms 24.53ms 55.85%
req/s : 0.00 56.33 43.69 8.06 76.46%
The same workload against Nginx:
$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com:8443
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: X25519 253 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
finished in 37.75s, 26307.76 req/s, 653.34MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.09GB (25863137721) total, 124.08MB (130105008) headers (space savings 35.15%), 23.92GB (25688290320) data
min max mean sd +/- sd
time for request: 56us 517.08ms 32.86ms 21.60ms 72.57%
time for connect: 217.08ms 394.19ms 334.81ms 29.56ms 71.19%
time to 1st byte: 306.27ms 549.34ms 361.49ms 33.89ms 64.50%
req/s : 0.00 38.09 30.64 5.50 66.21%
RPS, latencies, TTFB are better for Tempesta FW, but not so significantly.
Nginx on the standard Ubuntu kernel 5.15.0-86-generic
:
$ taskset --cpu-list 1,2,4,6 h2load -n 1000000 -c 1024 -t 4 https://tempesta-tech.com:8443
starting benchmark...
spawning thread #0: 256 total client(s). 250000 total requests
spawning thread #1: 256 total client(s). 250000 total requests
spawning thread #2: 256 total client(s). 250000 total requests
spawning thread #3: 256 total client(s). 250000 total requests
client could not connect to hostclient could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
client could not connect to host
TLS Protocol: TLSv1.2
Cipher: ECDHE-ECDSA-AES256-GCM-SHA384
Server Temp Key: X25519 253 bits
Application protocol: h2
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
finished in 27.87s, 35631.10 req/s, 884.90MB/s
requests: 1000000 total, 993168 started, 993168 done, 993168 succeeded, 6832 failed, 6832 errored, 0 timeout
status codes: 993168 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.09GB (25863498283) total, 124.42MB (130465570) headers (space savings 34.97%), 23.92GB (25688290320) data
min max mean sd +/- sd
time for request: 41us 365.74ms 23.85ms 12.87ms 78.71%
time for connect: 202.26ms 435.84ms 341.74ms 41.72ms 76.20%
time to 1st byte: 309.13ms 476.61ms 365.89ms 38.78ms 56.54%
req/s : 0.00 72.89 43.55 12.67 74.22%
, so apparently there is a kernel regression.
Need to create a new Wiki page about HTTP/2 performance to report the current performance and make the results reproducable
We should use the -m
option for h2load
when checking performance. By default max concurrent streams is 1 and in this case we will not see any changes in working with a multiple streams.
In tempesta-test we use: -c 10 and -m 100
-m, --max-concurrent-streams=N Max concurrent streams to issue per session. When http/1.1 is used, this specifies the number of HTTP pipelining requests in-flight. Default: 1
I have the following results on our server:
for nginx:
taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 94.242.233.20 8443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 9; HANDSHAKES 29436
HANDSHAKES/sec: MAX 3899; AVG 3062; 95P 1015; MIN 1015
LATENCY (ms): MIN 162.772; AVG 908.555; 95P 1397.94; MAX 2521.93
taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 94.242.233.20 8443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 9; HANDSHAKES 274905
HANDSHAKES/sec: MAX 34469; AVG 30146; 95P 7001; MIN 7001
LATENCY (ms): MIN 0.399567; AVG 56.5873; 95P 65.331; MAX 76.1464
for Tempesta:
taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com -l 1000 -t 4 -T 10 94.242.233.20 443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 8; HANDSHAKES 19854
HANDSHAKES/sec: MAX 3269; AVG 2137; 95P 931; MIN 931
LATENCY (ms): MIN 25.5909; AVG 1188.36; 95P 1756.09; MAX 2783.18
taskset --cpu-list 0,2,4,6 ./tls-perf -q --sni tempesta-tech.com --tickets on -l 1000 -t 4 -T 10 94.242.233.20 443
( All peers are active, start to gather statistics )
========================================
TOTAL: SECONDS 9; HANDSHAKES 276967
HANDSHAKES/sec: MAX 38332; AVG 30516; 95P 2638; MIN 2638
LATENCY (ms): MIN 0.12646; AVG 51.5548; 95P 57.4831; MAX 60.4444
10 bytes response
nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443 Running 30s test @ https://tempesta-tech.com:8443 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 12.32ms 8.03ms 68.78ms 71.24% Req/Sec 6.65k 573.02 7.73k 90.37% 792874 requests in 30.10s, 658.60MB read Requests/sec: 26340.64 Transfer/sec: 21.88MB
Tempesta: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 8.88ms 2.66ms 33.44ms 96.10% Req/Sec 5.77k 1.38k 7.33k 72.29% 611480 requests in 30.08s, 572.66MB read Requests/sec: 20328.04 Transfer/sec: 19.04MB
On branch MekhanikEvgenii/fix-socket-cpu-migration taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 8.88ms 3.83ms 371.71ms 92.83% Req/Sec 7.55k 1.13k 9.60k 72.94% 893831 requests in 30.09s, 835.97MB read Requests/sec: 29702.13 Transfer/sec: 27.78MB
10 bytes keep-alive
nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com:8443 Running 30s test @ https://tempesta-tech.com:8443 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 4.55ms 5.30ms 173.44ms 82.61% Req/Sec 84.80k 12.83k 105.94k 71.15% 10089405 requests in 30.01s, 8.23GB read Requests/sec: 336167.84 Transfer/sec: 280.84MB
Tempesta: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.77ms 3.22ms 384.87ms 99.72% Req/Sec 48.36k 3.86k 54.57k 98.48% 5729633 requests in 30.08s, 5.13GB read Requests/sec: 190455.56 Transfer/sec: 174.67MB
On branch MekhanikEvgenii/fix-socket-cpu-migration taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.71ms 1.91ms 202.20ms 95.45% Req/Sec 48.27k 3.45k 55.82k 98.73% 5724774 requests in 30.09s, 5.13GB read Requests/sec: 190224.66 Transfer/sec: 174.70MB
10 Kb response
nginx: taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com:8443 Running 30s test @ https://tempesta-tech.com:8443 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 9.85ms 5.74ms 61.71ms 82.56% Req/Sec 7.27k 714.25 8.29k 86.87% 867649 requests in 30.07s, 335.12MB read Requests/sec: 28850.70 Transfer/sec: 11.14MB
taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 9.06ms 3.13ms 226.67ms 90.75% Req/Sec 7.43k 0.99k 9.49k 79.93% 880554 requests in 30.05s, 824.65MB read Requests/sec: 29302.12 Transfer/sec: 27.44MB
On branch MekhanikEvgenii/fix-socket-cpu-migration taskset --cpu-list 0,2,4,6 wrk -c 1000 -t 4 -d 30 -H "Connection: close" https://tempesta-tech.com Running 30s test @ https://tempesta-tech.com 4 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 8.86ms 3.11ms 283.83ms 86.17% Req/Sec 7.47k 1.08k 9.55k 76.65% 884835 requests in 30.11s, 829.50MB read Requests/sec: 29390.95 Transfer/sec: 27.55MB
Testing environment: Used 3 CPUs VM for Tempesta, 3 CPUs VM for client. 612 bytes content-length. 210 headers length on nginx. 257 headers length on Tempesta. Default minimal config with enabled caching.
Interesting thing, testing Tempesta I can't load processor more then 50% doesn't meter how many CPUs used generating traffic. For both cases(1 and 3 CPU per client) I got maximum 50% of average cpu load. However, nginx has average load around 100% when for generating traffic used 3 CPUs.
Tempesta:
taskset --cpu-list 5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 271824.35 req/s, 207.64MB/s
requests: 5436487 total, 5445987 started, 5436487 done, 5436487 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5436487 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.06GB (4354630287) total, 886.57MB (929639277) headers (space savings 24.00%), 3.10GB (3327130044) data
min max mean sd +/- sd
time for request: 935us 267.49ms 31.21ms 15.22ms 68.37%
time for connect: 4.78ms 40.09ms 26.07ms 12.22ms 35.00%
time to 1st byte: 18.30ms 53.28ms 39.96ms 10.40ms 59.00%
req/s : 2237.13 3182.46 2718.05 220.84 65.00%
In: 50 Mbit/s
Out: 1.83 GBit/s
taskset --cpu-list 3,4,5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 224867.65 req/s, 171.99MB/s
requests: 4497353 total, 4506853 started, 4497353 done, 4497353 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4497353 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.36GB (3606881306) total, 737.71MB (773544716) headers (space savings 23.89%), 2.56GB (2752380036) data
min max mean sd +/- sd
time for request: 250us 661.34ms 40.37ms 28.09ms 78.48%
time for connect: 7.61ms 36.24ms 25.25ms 8.26ms 71.00%
time to 1st byte: 18.72ms 51.68ms 35.63ms 9.60ms 61.00%
req/s : 1527.13 4271.32 2248.38 520.91 63.00%
In: 105 Mbit/s
Out: 1.45 GBit/s
taskset --cpu-list 5 h2load -c10 -m95 -t2 -D20 https://ubuntu
finished in 20.00s, 288863.05 req/s, 220.94MB/s
requests: 5777261 total, 5778211 started, 5777261 done, 5777261 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5777261 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.32GB (4633363742) total, 947.66MB (993688892) headers (space savings 23.89%), 3.29GB (3535683732) data
min max mean sd +/- sd
time for request: 213us 15.70ms 2.55ms 1.17ms 80.37%
time for connect: 2.80ms 6.68ms 4.30ms 1.59ms 70.00%
time to 1st byte: 4.29ms 11.70ms 7.56ms 2.94ms 60.00%
req/s : 27561.91 30657.87 28883.85 1175.32 50.00%
In: 41 Mbit/s
Out: 1.95 GBit/s
taskset --cpu-list 3,4,5 h2load -c200 -m1 -t2 -D20 https://ubuntu
finished in 20.04s, 219628.00 req/s, 167.98MB/s
requests: 4392560 total, 4392760 started, 4392560 done, 4392560 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4392560 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.28GB (3522841520) total, 720.52MB (755520320) headers (space savings 23.89%), 2.50GB (2688246720) data
min max mean sd +/- sd
time for request: 100us 5.95ms 684us 199us 73.68%
time for connect: 20.75ms 37.25ms 33.79ms 3.30ms 78.50%
time to 1st byte: 34.85ms 40.94ms 37.69ms 1.91ms 55.50%
req/s : 1081.90 1124.74 1098.06 15.69 70.50%
In: 183 Mbit/s
Out: 1.47 GBit/s
Nginx:
taskset --cpu-list 5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.03s, 231835.10 req/s, 165.38MB/s
requests: 4636702 total, 4646202 started, 4636702 done, 4636702 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4636702 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.23GB (3468257996) total, 521.78MB (547130836) headers (space savings 33.33%), 2.64GB (2837661624) data
min max mean sd +/- sd
time for request: 544us 123.87ms 39.47ms 16.11ms 60.45%
time for connect: 3.22ms 30.94ms 17.57ms 8.01ms 58.00%
time to 1st byte: 15.96ms 89.42ms 52.64ms 13.97ms 66.00%
req/s : 1512.95 3509.12 2318.01 842.96 79.00%
In: 34 Mbit/s
Out: 1.38 GBit/s
taskset --cpu-list 3,4,5 h2load -c100 -m95 -t2 -D20 https://ubuntu
finished in 20.04s, 220740.35 req/s, 157.47MB/s
requests: 4414807 total, 4424307 started, 4414807 done, 4414807 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4414807 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.08GB (3302280536) total, 496.81MB (520947226) headers (space savings 33.33%), 2.52GB (2701861884) data
min max mean sd +/- sd
time for request: 180us 109.29ms 41.96ms 11.95ms 62.94%
time for connect: 3.60ms 32.29ms 13.93ms 7.22ms 65.00%
time to 1st byte: 14.84ms 84.05ms 44.77ms 19.05ms 65.00%
req/s : 1794.75 3344.97 2207.21 589.51 78.00%
In: 43 Mbit/s
Out: 1.28 GBit/s
PING Flood Handling Benchmark
Related issue: https://github.com/tempesta-tech/tempesta/issues/2117.
Install python3.12 and golang.
Refer to these source files: https://gist.github.com/kingluo/07e66502b420a96ceaa5dd430140f43b
- test python3 h2 + openssl (optionally with KTLS enabled)
sudo python3.12 h2_server.py
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1
Use btm
to observe the network throughput.
- test tempesta
sudo -E TFW_CFG_PATH=$HOME/tempesta-ping.conf ./scripts/tempesta.sh --restart
./ping_flood -address 192.168.2.1:443 -threads 1 -connections 10 -debug 1
- ping_flood will print an error if the ping ack sequence is not continuous.
- You can see that python3 has more than 10 times higher throughput than tempesta.
- The memory usage of python3 remains the same, but the memory usage of tempesta keeps increasing (OOM).