uwsgi not closing sockets, resulting in exhausted listen
I am experiencing some strange behavior with uwsgi where - under certain circumstances - uwsgi fails to close sockets which have been terminated by the other end, resulting in an endless loop of:
Fri Jul 22 16:26:23 2022 - *** uWSGI listen queue of socket ":9090" (fd: 3) full !!! (101/100) ***
Upon examination of the problem i noticed that uwsgi's backlog of int listen(int socket, int backlog) is being exhausted due to uwsgi having around 100 sockets in the close-wait state.
root@demo:/home/user/# ss -a -t 'sport = :9090'
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 101 100 0.0.0.0:9090 0.0.0.0:*
CLOSE-WAIT 357 0 172.19.0.6:9090 172.19.0.5:59400
CLOSE-WAIT 1110 0 172.19.0.6:9090 172.19.0.5:59490
[...]
root@demo:/home/user# ss -a -t 'sport = :9090' | wc -l
109
The logs of my application did not show all too much unusual errors apart from an occassional
[uwsgi-websocket] "GET /v1/socket" (172.19.0.5) no PONG received in 3 seconds !!!
I am not sure if this is related to the issue.
I started to dig deeper by attaching an strace to each of the uwsgi instances:
[pid 899819] 14:50:06.628894 accept4(3, {sa_family=AF_INET, sin_port=htons(40428), sin_addr=inet_addr("172.22.0.5")}, [110->16], SOCK_NONBLOCK) = 18
I was able to match up the port of a connection in CLOSE-WAIT from ss (e.g. here 40428) with the accept4 from the strace log to discover the file descriptor id of the socket (here 18).
While I see the sockets being accept4'ed, I can not see them ever being closed again leading them to remain in CLOSE-WAIT indefinitely. I then decided to attach a gdb to the processes with the open socket file descriptor and manually close it in gdb by running
(gdb) p (int) close(18)
And sure enough the number of CLOSE-WAIT sockets is immediately reduced by 1 and the uwsgi was able to accept a new incoming connection, and the particular CLOSE-WAIT disappears from ss
I also tcpdumped the traffic and I see that while the tcp close handshake stops after the application receives the FIN2 but fails to send the ACK.
14:48:46.306572 80 FIN-WAIT-1 172.22.0.5:40376 > 172.22.0.7:9090
14:48:46.350494 172.22.0.7:9090 A > 172.22.0.5:40376 (0)
14:48:46.350494 80 FIN-WAIT-2 172.22.0.5:40376 > 172.22.0.7:9090
Any ideas why uwsgi is holding these sockets open while they actually should be closed?
have you done more deeper investigation? I encountered this today, and I know how to reproduce it Because I have come SOCKET communication before UWSGI fork which is the root reason, But I don't know how to explain it