uwsgi icon indicating copy to clipboard operation
uwsgi copied to clipboard

uwsgi not closing sockets, resulting in exhausted listen

Open tuxx42 opened this issue 3 years ago • 1 comments

I am experiencing some strange behavior with uwsgi where - under certain circumstances - uwsgi fails to close sockets which have been terminated by the other end, resulting in an endless loop of:

Fri Jul 22 16:26:23 2022 - *** uWSGI listen queue of socket ":9090" (fd: 3) full !!! (101/100) ***

Upon examination of the problem i noticed that uwsgi's backlog of int listen(int socket, int backlog) is being exhausted due to uwsgi having around 100 sockets in the close-wait state.

root@demo:/home/user/# ss -a -t 'sport = :9090'
State              Recv-Q         Send-Q                 Local Address:Port                   Peer Address:Port
LISTEN             101            100                          0.0.0.0:9090                        0.0.0.0:*
CLOSE-WAIT         357            0                         172.19.0.6:9090                     172.19.0.5:59400
CLOSE-WAIT         1110           0                         172.19.0.6:9090                     172.19.0.5:59490
[...]
root@demo:/home/user# ss -a -t 'sport = :9090' | wc -l
109

The logs of my application did not show all too much unusual errors apart from an occassional

[uwsgi-websocket] "GET /v1/socket" (172.19.0.5) no PONG received in 3 seconds !!!

I am not sure if this is related to the issue.

I started to dig deeper by attaching an strace to each of the uwsgi instances:

[pid 899819] 14:50:06.628894 accept4(3, {sa_family=AF_INET, sin_port=htons(40428), sin_addr=inet_addr("172.22.0.5")}, [110->16], SOCK_NONBLOCK) = 18

I was able to match up the port of a connection in CLOSE-WAIT from ss (e.g. here 40428) with the accept4 from the strace log to discover the file descriptor id of the socket (here 18).

While I see the sockets being accept4'ed, I can not see them ever being closed again leading them to remain in CLOSE-WAIT indefinitely. I then decided to attach a gdb to the processes with the open socket file descriptor and manually close it in gdb by running (gdb) p (int) close(18)

And sure enough the number of CLOSE-WAIT sockets is immediately reduced by 1 and the uwsgi was able to accept a new incoming connection, and the particular CLOSE-WAIT disappears from ss

I also tcpdumped the traffic and I see that while the tcp close handshake stops after the application receives the FIN2 but fails to send the ACK.

14:48:46.306572  80     FIN-WAIT-1     172.22.0.5:40376 > 172.22.0.7:9090
14:48:46.350494  172.22.0.7:9090 A > 172.22.0.5:40376 (0)
14:48:46.350494  80     FIN-WAIT-2     172.22.0.5:40376 > 172.22.0.7:9090

Any ideas why uwsgi is holding these sockets open while they actually should be closed?

tuxx42 avatar Aug 03 '22 12:08 tuxx42

have you done more deeper investigation? I encountered this today, and I know how to reproduce it Because I have come SOCKET communication before UWSGI fork which is the root reason, But I don't know how to explain it

ING-XIAOJIAN avatar Mar 30 '23 11:03 ING-XIAOJIAN