VPP main thread stuck at session_wrk_handle_evts_main_rpc() when working with nginx through VCL
DUT setup like this:
(eth1 + bvi if) bridge1
|
http client -------- VPP + VCL + nginx(reverse proxy)
|
(bvi if + eth2) bridge2
| --------------------------------------- http backend server
VPP starts with two workers, session configure: session { enable use-app-socket-api event-queue-length 100000 preallocated-sessions 100000 evt_qs_memfd_seg }
start nginx with LD_PRELOAD=$LDP_PATH VCL_CONFIG=$VCL_CFG VCL configure: vcl { heapsize 512M #use-mq-eventfd 1 app-socket-api /var/run/vpp/app_ns_sockets/default segment-size 536870912 add-segment-size 536870912 event-queue-size 500000 rx-fifo-size 65536 tx-fifo-size 65536 }
http client can get/post backend server through the reverse proxy Nginx with curl in the first, which means the http traffic path is OK. But when I use wrk to do performance test, with http traffic forwarding, and then reload nginx by kill -HUP NGINX-PID, then VPP will stuck at session_wrk_handle_evts_main_rpc().
Tried VPP with version 22.10/23.10/25.10, all with the same problem. enable/disable use-mq-eventfd with no improvement. If I increase event-queue-length from 8192 to 100000/500000, then symptoms have eased, what I mean is, event-queue-length = 8192, the first time reload nginx, stuck happens. event-queue-length = 100000, the first time reload nginx, VPP can work and vppctl show sessions can be displayed. but when I reload nginx the second time(30s later), stuck happens.
This is easy to reproduce, seems a problem about syncing from worker to main thread about the tcp connections, see the attached for more details.