iperf
iperf copied to clipboard
Segfault during `cleanup_server` for bidirectional or with parallel stream tests ended early
Context
-
Version of iperf3: 3.16
-
Hardware: N/A
-
Operating system (and distribution, if any): 6.5.0-26-generic # 26~22.04.1-Ubuntu
-
Other relevant information (for example, non-default compilers, libraries, cross-compiling, etc.): N/A
Bug Report
While doing some testing I would occasionally use the wrong iperf flags/parameters and would terminate the test early rather than waiting for it to run completely.
-
Expected Behavior: Terminating a test early causes the client and server to stop testing. The client cleans up and terminates. The server cleans up and prepares for next test.
-
Actual Behavior: Server segfaults during cleanup
-
Steps to Reproduce
- Simulate a high-ish latency link on the loopback interface:
tc qdisc add dev lo root netem delay 50ms
- Start server:
iperf3 -s
- Start client and terminate test early:
iperf3 -c 127.0.0.1 -t 10 -P 10
oriperf3 -c 127.0.0.1 -t 10 --bidir
- It seems to be a race condition so to have better chances at finding it I'll typically run something like
for i in $(seq 100); do iperf3 -c 127.0.0.1 -t 10 -P 10; done
then repeatedly use ctrl-c to kill tests.
- It seems to be a race condition so to have better chances at finding it I'll typically run something like
- Check server crashed
- Simulate a high-ish latency link on the loopback interface:
-
Possible Solution Adding an assert into the code here shows the root cause. Something like
assert(sp->thr != 0);
. This would indicate that a NULL values is being passed intopthread_cancel
. A possible solution would be a NULL check before attempting to cancel the thread. -
Other observations I was not able to reproduce the issue using 3.15 as the server.
You can also get a similar crash on the client side here if you queue up a bunch of client side tests (i.e. for i in $(seq 100); do iperf3 -c 127.0.0.1 -t 10 -P 10; done
) then repeatedly start and kill the server.
Can you try ruining these tests using PR #1654 code? The issues may be related, so it seems to be worth testing whether the PR also fix this issue. (I am using WSL which does not support tc qdisc ... netem ...
.)
It does not.
You can see your changes working correctly in test # 1 but it still segfaults in test # 2. ( I added an assert to show where it was failing)
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
get_parameters:
{
"tcp": true,
"omit": 0,
"time": 10,
"num": 0,
"blockcount": 0,
"parallel": 10,
"len": 131072,
"pacing_timer": 1000,
"client_version": "3.16+"
}
SNDBUF is 16384, expecting 0
RCVBUF is 131072, expecting 0
Accepted connection from 127.0.0.1, port 52362
Congestion algorithm is cubic
[ 5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52376
Congestion algorithm is cubic
[ 8] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52382
Congestion algorithm is cubic
[ 10] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52384
Congestion algorithm is cubic
[ 12] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52396
Congestion algorithm is cubic
[ 14] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52412
Congestion algorithm is cubic
[ 16] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52414
Congestion algorithm is cubic
[ 18] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52430
Congestion algorithm is cubic
[ 20] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52440
Congestion algorithm is cubic
[ 22] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52452
Congestion algorithm is cubic
[ 24] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52462
Thread number 1 FD 5 created
Thread number 2 FD 8 created
Thread number 3 FD 10 created
Thread number 4 FD 12 created
Thread number 5 FD 14 created
Thread number 6 FD 16 created
Thread number 7 FD 18 created
Thread number 8 FD 20 created
Thread number 9 FD 22 created
Thread number 10 FD 24 created
All threads created
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100211
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100202
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100178
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100119
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100222
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100203
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100278
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100274
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100295
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100248
interval_len 1.001124 bytes_transferred 11272192
interval forces keep
interval_len 1.001168 bytes_transferred 11403264
interval forces keep
interval_len 1.001173 bytes_transferred 11403264
interval forces keep
interval_len 1.001177 bytes_transferred 10616832
interval forces keep
interval_len 1.001181 bytes_transferred 11403264
interval forces keep
interval_len 1.001186 bytes_transferred 11141120
interval forces keep
interval_len 1.001194 bytes_transferred 10747904
interval forces keep
interval_len 1.001241 bytes_transferred 10747904
interval forces keep
interval_len 1.001247 bytes_transferred 9699328
interval forces keep
interval_len 1.001252 bytes_transferred 7733248
interval forces keep
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 10.8 MBytes 90.1 Mbits/sec
[ 8] 0.00-1.00 sec 10.9 MBytes 91.1 Mbits/sec
[ 10] 0.00-1.00 sec 10.9 MBytes 91.1 Mbits/sec
[ 12] 0.00-1.00 sec 10.1 MBytes 84.8 Mbits/sec
[ 14] 0.00-1.00 sec 10.9 MBytes 91.1 Mbits/sec
[ 16] 0.00-1.00 sec 10.6 MBytes 89.0 Mbits/sec
[ 18] 0.00-1.00 sec 10.2 MBytes 85.9 Mbits/sec
[ 20] 0.00-1.00 sec 10.2 MBytes 85.9 Mbits/sec
[ 22] 0.00-1.00 sec 9.25 MBytes 77.5 Mbits/sec
[ 24] 0.00-1.00 sec 7.38 MBytes 61.8 Mbits/sec
[SUM] 0.00-1.00 sec 101 MBytes 848 Mbits/sec
interval_len 1.001124 bytes_transferred 11272192
interval forces keep
interval_len 1.001168 bytes_transferred 11403264
interval forces keep
interval_len 1.001173 bytes_transferred 11403264
interval forces keep
interval_len 1.001177 bytes_transferred 10616832
interval forces keep
interval_len 1.001181 bytes_transferred 11403264
interval forces keep
interval_len 1.001186 bytes_transferred 11141120
interval forces keep
interval_len 1.001194 bytes_transferred 10747904
interval forces keep
interval_len 1.001241 bytes_transferred 10747904
interval forces keep
interval_len 1.001247 bytes_transferred 9699328
interval forces keep
interval_len 1.001252 bytes_transferred 7733248
interval forces keep
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 10.8 MBytes 90.1 Mbits/sec
[ 8] 0.00-1.00 sec 10.9 MBytes 91.1 Mbits/sec
[ 10] 0.00-1.00 sec 10.9 MBytes 91.1 Mbits/sec
[ 12] 0.00-1.00 sec 10.1 MBytes 84.8 Mbits/sec
[ 14] 0.00-1.00 sec 10.9 MBytes 91.1 Mbits/sec
[ 16] 0.00-1.00 sec 10.6 MBytes 89.0 Mbits/sec
[ 18] 0.00-1.00 sec 10.2 MBytes 85.9 Mbits/sec
[ 20] 0.00-1.00 sec 10.2 MBytes 85.9 Mbits/sec
[ 22] 0.00-1.00 sec 9.25 MBytes 77.5 Mbits/sec
[ 24] 0.00-1.00 sec 7.38 MBytes 61.8 Mbits/sec
[SUM] 0.00-1.00 sec 101 MBytes 848 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 27.9 MBytes 234 Mbits/sec receiver
[ 8] 0.00-1.00 sec 28.0 MBytes 235 Mbits/sec receiver
[ 10] 0.00-1.00 sec 28.0 MBytes 235 Mbits/sec receiver
[ 12] 0.00-1.00 sec 26.8 MBytes 224 Mbits/sec receiver
[ 14] 0.00-1.00 sec 28.0 MBytes 235 Mbits/sec receiver
[ 16] 0.00-1.00 sec 27.6 MBytes 231 Mbits/sec receiver
[ 18] 0.00-1.00 sec 27.4 MBytes 229 Mbits/sec receiver
[ 20] 0.00-1.00 sec 27.5 MBytes 230 Mbits/sec receiver
[ 22] 0.00-1.00 sec 26.6 MBytes 223 Mbits/sec receiver
[ 24] 0.00-1.00 sec 24.0 MBytes 201 Mbits/sec receiver
[SUM] 0.00-1.00 sec 272 MBytes 2.28 Gbits/sec receiver
iperf3: the client has terminated
Thread number 1 FD 5 stopped
Thread number 2 FD 8 stopped
Thread number 3 FD 10 stopped
Thread number 6 FD 16 terminated unexpectedly
Thread number 4 FD 12 stopped
Thread number 5 FD 14 stopped
Thread number 6 FD 16 stopped
Thread number 7 FD 18 stopped
Thread number 8 FD 20 stopped
Thread number 9 FD 22 stopped
Thread number 10 FD 24 stopped
All threads stopped
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
get_parameters:
{
"tcp": true,
"omit": 0,
"time": 10,
"num": 0,
"blockcount": 0,
"parallel": 10,
"len": 131072,
"pacing_timer": 1000,
"client_version": "3.16+"
}
SNDBUF is 16384, expecting 0
RCVBUF is 131072, expecting 0
Accepted connection from 127.0.0.1, port 55980
Congestion algorithm is cubic
[ 5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 55982
ignoring short interval with no data
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-0.00 sec 0.00 Bytes 0.00 bits/sec receiver
[SUM] 0.00-0.00 sec 0.00 Bytes 0.00 bits/sec receiver
iperf3: the client has terminated
iperf3: iperf_server_api.c:433: cleanup_server: Assertion `sp->thr != 0' failed.
Aborted (core dumped)
Thanks for testing. The second test failed because the termination happened before all threads where created. I enhanced PR #1654 to also handle this case. Can you check if the PR now fully resolves the issue?
Thanks for testing. The second test failed because the termination happened before all threads where created. I enhanced PR #1654 to also handle this case. Can you check if the PR now fully resolves the issue?
I am not able to recreate the issue using the most recent changes in PR #1654. Seems fixed.