iperf icon indicating copy to clipboard operation
iperf copied to clipboard

Segfault during `cleanup_server` for bidirectional or with parallel stream tests ended early

Open MattCatz opened this issue 9 months ago • 5 comments

Context

  • Version of iperf3: 3.16

  • Hardware: N/A

  • Operating system (and distribution, if any): 6.5.0-26-generic # 26~22.04.1-Ubuntu

  • Other relevant information (for example, non-default compilers, libraries, cross-compiling, etc.): N/A

Bug Report

While doing some testing I would occasionally use the wrong iperf flags/parameters and would terminate the test early rather than waiting for it to run completely.

  • Expected Behavior: Terminating a test early causes the client and server to stop testing. The client cleans up and terminates. The server cleans up and prepares for next test.

  • Actual Behavior: Server segfaults during cleanup

  • Steps to Reproduce

    1. Simulate a high-ish latency link on the loopback interface: tc qdisc add dev lo root netem delay 50ms
    2. Start server: iperf3 -s
    3. Start client and terminate test early: iperf3 -c 127.0.0.1 -t 10 -P 10 or iperf3 -c 127.0.0.1 -t 10 --bidir
      • It seems to be a race condition so to have better chances at finding it I'll typically run something like for i in $(seq 100); do iperf3 -c 127.0.0.1 -t 10 -P 10; done then repeatedly use ctrl-c to kill tests.
    4. Check server crashed
  • Possible Solution Adding an assert into the code here shows the root cause. Something like assert(sp->thr != 0);. This would indicate that a NULL values is being passed into pthread_cancel. A possible solution would be a NULL check before attempting to cancel the thread.

  • Other observations I was not able to reproduce the issue using 3.15 as the server.

MattCatz avatar May 10 '24 16:05 MattCatz

You can also get a similar crash on the client side here if you queue up a bunch of client side tests (i.e. for i in $(seq 100); do iperf3 -c 127.0.0.1 -t 10 -P 10; done) then repeatedly start and kill the server.

MattCatz avatar May 10 '24 18:05 MattCatz

Can you try ruining these tests using PR #1654 code? The issues may be related, so it seems to be worth testing whether the PR also fix this issue. (I am using WSL which does not support tc qdisc ... netem ....)

davidBar-On avatar May 10 '24 19:05 davidBar-On

It does not.

You can see your changes working correctly in test # 1 but it still segfaults in test # 2. ( I added an assert to show where it was failing)

-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
get_parameters:
{
	"tcp":	true,
	"omit":	0,
	"time":	10,
	"num":	0,
	"blockcount":	0,
	"parallel":	10,
	"len":	131072,
	"pacing_timer":	1000,
	"client_version":	"3.16+"
}
SNDBUF is 16384, expecting 0
RCVBUF is 131072, expecting 0
Accepted connection from 127.0.0.1, port 52362
Congestion algorithm is cubic
[  5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52376
Congestion algorithm is cubic
[  8] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52382
Congestion algorithm is cubic
[ 10] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52384
Congestion algorithm is cubic
[ 12] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52396
Congestion algorithm is cubic
[ 14] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52412
Congestion algorithm is cubic
[ 16] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52414
Congestion algorithm is cubic
[ 18] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52430
Congestion algorithm is cubic
[ 20] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52440
Congestion algorithm is cubic
[ 22] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52452
Congestion algorithm is cubic
[ 24] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 52462
Thread number 1 FD 5 created
Thread number 2 FD 8 created
Thread number 3 FD 10 created
Thread number 4 FD 12 created
Thread number 5 FD 14 created
Thread number 6 FD 16 created
Thread number 7 FD 18 created
Thread number 8 FD 20 created
Thread number 9 FD 22 created
Thread number 10 FD 24 created
All threads created
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100211
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100202
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100178
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100119
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100222
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100203
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100278
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100274
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100295
tcpi_snd_cwnd 10 tcpi_snd_mss 32768 tcpi_rtt 100248
interval_len 1.001124 bytes_transferred 11272192
interval forces keep
interval_len 1.001168 bytes_transferred 11403264
interval forces keep
interval_len 1.001173 bytes_transferred 11403264
interval forces keep
interval_len 1.001177 bytes_transferred 10616832
interval forces keep
interval_len 1.001181 bytes_transferred 11403264
interval forces keep
interval_len 1.001186 bytes_transferred 11141120
interval forces keep
interval_len 1.001194 bytes_transferred 10747904
interval forces keep
interval_len 1.001241 bytes_transferred 10747904
interval forces keep
interval_len 1.001247 bytes_transferred 9699328
interval forces keep
interval_len 1.001252 bytes_transferred 7733248
interval forces keep
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  10.8 MBytes  90.1 Mbits/sec                  
[  8]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 10]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 12]   0.00-1.00   sec  10.1 MBytes  84.8 Mbits/sec                  
[ 14]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 16]   0.00-1.00   sec  10.6 MBytes  89.0 Mbits/sec                  
[ 18]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 20]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 22]   0.00-1.00   sec  9.25 MBytes  77.5 Mbits/sec                  
[ 24]   0.00-1.00   sec  7.38 MBytes  61.8 Mbits/sec                  
[SUM]   0.00-1.00   sec   101 MBytes   848 Mbits/sec                  
interval_len 1.001124 bytes_transferred 11272192
interval forces keep
interval_len 1.001168 bytes_transferred 11403264
interval forces keep
interval_len 1.001173 bytes_transferred 11403264
interval forces keep
interval_len 1.001177 bytes_transferred 10616832
interval forces keep
interval_len 1.001181 bytes_transferred 11403264
interval forces keep
interval_len 1.001186 bytes_transferred 11141120
interval forces keep
interval_len 1.001194 bytes_transferred 10747904
interval forces keep
interval_len 1.001241 bytes_transferred 10747904
interval forces keep
interval_len 1.001247 bytes_transferred 9699328
interval forces keep
interval_len 1.001252 bytes_transferred 7733248
interval forces keep
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  10.8 MBytes  90.1 Mbits/sec                  
[  8]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 10]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 12]   0.00-1.00   sec  10.1 MBytes  84.8 Mbits/sec                  
[ 14]   0.00-1.00   sec  10.9 MBytes  91.1 Mbits/sec                  
[ 16]   0.00-1.00   sec  10.6 MBytes  89.0 Mbits/sec                  
[ 18]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 20]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec                  
[ 22]   0.00-1.00   sec  9.25 MBytes  77.5 Mbits/sec                  
[ 24]   0.00-1.00   sec  7.38 MBytes  61.8 Mbits/sec                  
[SUM]   0.00-1.00   sec   101 MBytes   848 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  27.9 MBytes   234 Mbits/sec                  receiver
[  8]   0.00-1.00   sec  28.0 MBytes   235 Mbits/sec                  receiver
[ 10]   0.00-1.00   sec  28.0 MBytes   235 Mbits/sec                  receiver
[ 12]   0.00-1.00   sec  26.8 MBytes   224 Mbits/sec                  receiver
[ 14]   0.00-1.00   sec  28.0 MBytes   235 Mbits/sec                  receiver
[ 16]   0.00-1.00   sec  27.6 MBytes   231 Mbits/sec                  receiver
[ 18]   0.00-1.00   sec  27.4 MBytes   229 Mbits/sec                  receiver
[ 20]   0.00-1.00   sec  27.5 MBytes   230 Mbits/sec                  receiver
[ 22]   0.00-1.00   sec  26.6 MBytes   223 Mbits/sec                  receiver
[ 24]   0.00-1.00   sec  24.0 MBytes   201 Mbits/sec                  receiver
[SUM]   0.00-1.00   sec   272 MBytes  2.28 Gbits/sec                  receiver
iperf3: the client has terminated
Thread number 1 FD 5 stopped
Thread number 2 FD 8 stopped
Thread number 3 FD 10 stopped
Thread number 6 FD 16 terminated unexpectedly
Thread number 4 FD 12 stopped
Thread number 5 FD 14 stopped
Thread number 6 FD 16 stopped
Thread number 7 FD 18 stopped
Thread number 8 FD 20 stopped
Thread number 9 FD 22 stopped
Thread number 10 FD 24 stopped
All threads stopped
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
get_parameters:
{
	"tcp":	true,
	"omit":	0,
	"time":	10,
	"num":	0,
	"blockcount":	0,
	"parallel":	10,
	"len":	131072,
	"pacing_timer":	1000,
	"client_version":	"3.16+"
}
SNDBUF is 16384, expecting 0
RCVBUF is 131072, expecting 0
Accepted connection from 127.0.0.1, port 55980
Congestion algorithm is cubic
[  5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 55982
ignoring short interval with no data
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-0.00   sec  0.00 Bytes  0.00 bits/sec                  receiver
[SUM]   0.00-0.00   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: the client has terminated
iperf3: iperf_server_api.c:433: cleanup_server: Assertion `sp->thr != 0' failed.
Aborted (core dumped)

MattCatz avatar May 10 '24 19:05 MattCatz

Thanks for testing. The second test failed because the termination happened before all threads where created. I enhanced PR #1654 to also handle this case. Can you check if the PR now fully resolves the issue?

davidBar-On avatar May 11 '24 06:05 davidBar-On

Thanks for testing. The second test failed because the termination happened before all threads where created. I enhanced PR #1654 to also handle this case. Can you check if the PR now fully resolves the issue?

I am not able to recreate the issue using the most recent changes in PR #1654. Seems fixed.

MattCatz avatar May 12 '24 22:05 MattCatz