iperf icon indicating copy to clipboard operation
iperf copied to clipboard

Cannot test long UDP sessions

Open teifip opened this issue 7 years ago • 5 comments

Context

Server side iperf 3.6 (cJSON 1.5.2) on Ubuntu 16.04.1 (built from source code)

Two servers running concurrently on the same n1-highcpu-2 (2 vCPUs, 1.8 GB memory) VM on Google Cloud Platform: iperf3 -s -p5201 (used to test server to client path) iperf3 -s -p5202 (used to test client to server path)

Client side iperf 3.6 (cJSON 1.5.2) on Windows 10 (binary obtained from here)

Two clients running concurrently on the same i&-7820HQ @ 2.90GHz machine, one in normal mode and one in reverse mode: iperf3 -c X.X.X.X -u -b1M -p5201 -R -t7400 (used to test server to client path) iperf3 -c X.X.X.X -u -b1M -p5202 -t7400 (used to test client to server path)

Bug Report

The tests conclude with the same error on both servers:

[  5] 7403.00-7404.00 sec   123 KBytes  1.00 Mbits/sec  86
iperf3: error - select failed: Bad file descriptor
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
[  5] 7403.00-7404.00 sec  0.00 Bytes  0.00 bits/sec  0.039 ms  0/0 (0%)
iperf3: error - select failed: Bad file descriptor
-----------------------------------------------------------
Server listening on 5202
-----------------------------------------------------------

And both clients report a connection reset:

[  5] 7398.00-7399.00 sec   123 KBytes  1.00 Mbits/sec  0.199 ms  0/86 (0%)
iperf3: error - unable to receive control message: Connection reset by peer
[  5] 7398.00-7399.00 sec   121 KBytes   992 Kbits/sec  85
iperf3: error - unable to receive control message: Connection reset by peer

This problem is systematic with test duration of 2 hours or above, while I have not noticed the same problem with shorter tests, such as one hour. However, I cannot say that these observations are conclusive.

This situation situation is particularly severe when option -J is used and the client in normal mode is launched with the --get-server-output option. Upon experiencing the problem, the server:

  • Does not produce the JSON output;
  • Does not pass the results to the client.

Therefore, the statistics at the server side are completely lost.

Notes

I see that there are past issues related to cases when the test duration at the server side appears longer than at the client side like above. Where they supposed to be resolved in v3.6?

Issue #735 has definitely some commonalities with this. In particular, it stands out that also in my case the server error occurs when the test duration at the server side gets 5s longer than what requested by the client. However, in my case, the connection under test is quite fast.

Ping statistics for X.X.X.X:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 12ms, Maximum = 13ms, Average = 12ms

There is simply no way that it can hold 4-5 seconds of packets in flight. Why the duration at the server and client side appear different?

Is there any reason why I may see different behavior when the test duration gets longer, say close to 2 hours? I mean, are there timeouts that may affect the TCP control connection while the tests run on UDP? Or is it some sort of timing skew between client and server that gets accumulated?

Thanks!!

teifip avatar Nov 06 '18 23:11 teifip

Our current thinking is that this could be caused by a stateful firewall that times out on the control connection (during a long test the control connection can be idle for a very long time). Clock synchronization between the client and server are not required (with the exception of #842, which isn't really related to this issue).

#835 is vaguely related in the sense that it also deals with the idleness of the control connection.

Is this still an issue?

bmah888 avatar May 11 '20 18:05 bmah888

iPerf3 sends no packets to keep the TCP connection alive when running in UDP mode. Our firewall closes the inactive TCP connection, resulting in this exact problem. I have a packet capture to demonstrate this issue if necessary.

ArielPrevu3D avatar Nov 18 '22 17:11 ArielPrevu3D

Submitted a PR #1423 with a suggested fix for this issue (support TCP keepalive for the control connection).

davidBar-On avatar Nov 23 '22 15:11 davidBar-On

Any idea when this change would be merged ? Or how I can get binary with this change?

ankitg12 avatar Jan 09 '23 09:01 ankitg12

Any idea when this change would be merged ? Or how I can get binary with this change?

Sent a reply in PR #1423.

davidBar-On avatar Jan 10 '23 18:01 davidBar-On