sonic-mgmt
sonic-mgmt copied to clipboard
[action] [PR:15226] [dualtor][mux_simulator] Fix mux simulator stuck
Description of PR
Summary: Fixes # (issue)
Type of change
- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] Test case(new/improvement)
Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [x] 202405
Approach
What is the motivation for this PR?
Active-standby Dualtor is failing to talk to mux_simulator:
# curl -v http://10.64.246.154:8082/mux/vms24-7/24
* Trying 10.64.246.154:8082...
- on the test server, TCP syn drops are reported increasing:
# netstat -s | grep -i listen
1531500 times the listen queue of a socket overflowed
1531501 SYNs to LISTEN sockets dropped
- mux simulator sync queue is overflowing:
# ss -lnt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 129 128 0.0.0.0:8082 0.0.0.0:*
- It appeared that
mux_simulatoris stuck in therecvfrom:
# strace -p 21315
strace: Process 21315 attached
recvfrom(6,
- and there is no existing TCP connection on the test server/DUT for fd 6.
mux_simulator is blocking reading from an already closed TCP connection, so subsequent HTTP requests cannot be handled properly, which resulted in the TCP sync queue overflow.
How did you do it?
- Enable
mux_simulatorto work in threaded mode. - Set socket timeout to 60s, if a worker thread stucks in the
recvfromlike this, this will ensure the work thread exits after 60s, so no resource leak.
How did you verify/test it?
Run mux_simulator with the change.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
Original PR: https://github.com/sonic-net/sonic-mgmt/pull/15226