Ⅰ. Issue Description
使用tengine+xquic+tongsuo+QAT测试QUIC的性能,对比不用QAT硬件加速时性能无提升
Ⅱ. Describe what happened
使用tengine+xquic+tongsuo搭建HTTP3的server环境,同时使用QAT硬件对加解密算法进行加速,但是测试效果发现使用QAT硬件无法提升HTTPS的性能
Ⅲ. Describe what you expected to happen
intel使用NGINX-QUIC(async-mode)+boringSSL+QAT,性能提升了4.6倍。
参考file:///home/zhoubin/Downloads/IETF%20QUIC%20Acceleration%20using%20Intel%C2%AE%20QuickAssist%20Technology%20-Intel%20QAT%20(1).pdf
期望使用tengine+xquic+tongsuo+QAT也能有4倍左右的性能提升
Ⅳ. How to reproduce it (as minimally and precisely as possible)
Ⅴ. Anything else we need to know?
大概分析了握手过程中的调用栈,原因应该是tengine+xquic不支持async_mode导致的。
Thread 1 "tengine" hit Breakpoint 1, qat_engine_ecdh_compute_key (out=out@entry=0x7fff4fcf0b18, outlen=outlen@entry=0x7fff4fcf0b20, pub_key=0x55a8bd93d870, ecdh=0x55a8bd92ff30) at qat_hw_ec.c:795
795 {
(gdb) bt
#0 qat_engine_ecdh_compute_key (out=out@entry=0x7fff4fcf0b18, outlen=outlen@entry=0x7fff4fcf0b20, pub_key=0x55a8bd93d870, ecdh=0x55a8bd92ff30) at qat_hw_ec.c:795
#1 0x00007fcdb725a43f in QAT_ECDH_compute_key (out=out@entry=0x55a8bd92ba10, outlen=, outlen@entry=32, pub_key=, eckey=eckey@entry=0x55a8bd92ff30, KDF=KDF@entry=0x0)
at qat_prov_ecdh.c:582
#2 0x00007fcdb725a964 in qat_keyexch_ecdh_plain_derive (outlen=, psecretlen=0x7fff4fcf0cb0, secret=0x55a8bd92ba10 "\233\020\031\347\255U", vpecdhctx=0x55a8bd95af20)
at qat_prov_ecdh.c:666
#3 qat_keyexch_ecdh_derive (vpecdhctx=0x55a8bd95af20, secret=0x55a8bd92ba10 "\233\020\031\347\255U", psecretlen=0x7fff4fcf0cb0, outlen=) at qat_prov_ecdh.c:763
#4 0x00007fcdb79aa2e2 in ssl_derive (s=s@entry=0x55a8bd964d60, privkey=, pubkey=pubkey@entry=0x55a8bd8f6d60, gensecret=gensecret@entry=1) at ssl/s3_lib.c:4125
#5 0x00007fcdb79eae7d in tls_construct_stoc_key_share (s=0x55a8bd964d60, pkt=0x7fff4fcf0e30, context=, x=, chainidx=)
at ssl/statem/extensions_srvr.c:1779
#6 0x00007fcdb79e0eac in tls_construct_extensions (s=s@entry=0x55a8bd964d60, pkt=pkt@entry=0x7fff4fcf0e30, context=512, x=x@entry=0x0, chainidx=chainidx@entry=0) at ssl/statem/extensions.c:881
#7 0x00007fcdb79facb2 in tls_construct_server_hello (s=0x55a8bd964d60, pkt=0x7fff4fcf0e30) at ssl/statem/statem_srvr.c:2560
#8 0x00007fcdb79ec49d in write_state_machine (s=0x55a8bd964d60) at ssl/statem/statem.c:890
#9 state_machine (s=0x55a8bd964d60, server=1) at ssl/statem/statem.c:482
#10 0x00007fcdb79b8a08 in SSL_do_handshake (s=0x55a8bd964d60) at ssl/ssl_lib.c:4075
#11 0x00007fcdb7b955e1 in xqc_ssl_do_handshake (ssl=0x55a8bd964d60) at /home/zhoubin/code/quic/xquic/src/tls/babassl/xqc_ssl_if.c:119
#12 0x00007fcdb7b921af in xqc_tls_do_handshake (tls=0x55a8bd913480) at /home/zhoubin/code/quic/xquic/src/tls/xqc_tls.c:394
#13 0x00007fcdb7b928da in xqc_tls_process_crypto_data (tls=0x55a8bd913480, level=XQC_ENC_LEV_INIT, crypto_data=0x55a8bd8e6f60 "\001", data_len=559)
at /home/zhoubin/code/quic/xquic/src/tls/xqc_tls.c:568
#14 0x00007fcdb7b630d0 in xqc_read_crypto_stream (stream=0x55a8bd95c2c0) at /home/zhoubin/code/quic/xquic/src/transport/xqc_stream.c:729
#15 0x00007fcdb7b76cd2 in xqc_process_crypto_frame (conn=0x55a8bd913680, packet_in=0x7fff4fcf1190) at /home/zhoubin/code/quic/xquic/src/transport/xqc_frame.c:559
#16 0x00007fcdb7b756b2 in xqc_process_frames (conn=0x55a8bd913680, packet_in=0x7fff4fcf1190) at
/home/zhoubin/code/quic/xquic/src/transport/xqc_frame.c:224
#17 0x00007fcdb7b7497c in xqc_packet_decrypt_single (c=0x55a8bd913680, packet_in=0x7fff4fcf1190) at /home/zhoubin/code/quic/xquic/src/transport/xqc_packet.c:183
#18 0x00007fcdb7b74b35 in xqc_packet_process_single (c=0x55a8bd913680, packet_in=0x7fff4fcf1190) at /home/zhoubin/code/quic/xquic/src/transport/xqc_packet.c:226
#19 0x00007fcdb7b4efbf in xqc_conn_process_packet (c=0x55a8bd913680, packet_in_buf=0x55a8bcecdbc8 <packet+232> <incomplete sequence \303>, packet_in_size=1216, recv_time=1706106787763823)
at /home/zhoubin/code/quic/xquic/src/transport/xqc_conn.c:3528
#20 0x00007fcdb7b4112f in xqc_engine_packet_process (engine=0x55a8bd8b5590, packet_in_buf=0x55a8bcecdbc8 <packet+232> <incomplete sequence \303>, packet_in_size=1216,
local_addr=0x55a8bcecdb54 <packet+116>, local_addrlen=16, peer_addr=0x55a8bcecdae0 , peer_addrlen=16, recv_time=1706106787763823, user_data=0x7fcdb48aa010)
at /home/zhoubin/code/quic/xquic/src/transport/xqc_engine.c:1227
#21 0x000055a8bce4995e in ngx_xquic_dispatcher_process_packet (c=c@entry=0x7fcdb48aa010, packet=packet@entry=0x55a8bcecdae0 ) at modules/ngx_http_xquic_module/ngx_xquic_recv.c:403
#22 0x000055a8bce49ad5 in ngx_xquic_event_recv (ev=0x55a8bd885570) at modules/ngx_http_xquic_module/ngx_xquic_recv.c:312
#23 0x000055a8bcdceca3 in ngx_epoll_process_events (cycle=, timer=, flags=) at src/event/modules/ngx_epoll_module.c:972
#24 0x000055a8bcdc2c1e in ngx_process_events_and_timers (cycle=cycle@entry=0x55a8bd85eb90) at src/event/ngx_event.c:284
#25 0x000055a8bcdcdf44 in ngx_single_process_cycle (cycle=cycle@entry=0x55a8bd85eb90) at src/os/unix/ngx_process_cycle.c:336
#26 0x000055a8bcda0dd1 in main (argc=, argv=) at src/core/nginx.c:416
以上是抓取的调用栈,结合代码分析,所有函数的调用过程都是同步的,也就是说需要等到QAT完成加解密之后SSL_do_handshake才会返回,由于tengine只有一个线程,在硬件加解密期间无法处理其他的连接请求,从而导致性能无法提升。
使用tengine+tongsuo+qat测试HTTP2时性能是可以大幅提升的,因为tengine支持ssl_async on的配置,调用SSL_do_handshake之前将ssl的mode设置成异步,这样就完全将openssl的async_job(协程)用起来了,SSL_do_handshake在硬件没有完成加解密时就会返回,这样tengine就能立刻处理下一个连接请求,就能大幅提升性能了。
非常非常希望tengine+xquic可以支持tongsuo的异步ssl,这样使用如QAT加速卡就能大幅提升性能了
Ⅵ. Environment:
- Tengine version (use
sbin/nginx -V
):
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
):
- Others: