iperf icon indicating copy to clipboard operation
iperf copied to clipboard

Increase TCP bandwidth for small messages

Open davidBar-On opened this issue 9 months ago • 2 comments

  • Version of iperf3 (or development branch, such as master or 3.1-STABLE) to which this pull request applies: master

  • Issues fixed (if any): #1078

  • Brief description of code changes (suitable for use as a commit message):

Suggested enhancement to resolve the iperf3 TCP low bandwidth with small message sizes, compared to iper2 and netperf. This is by receiving all the sent burst messages as one read message. In my environment, for -l1500 throughput is increased by about 35% for a single stream and more than 50% for multi streams tests.

It seems that the main reason for iperf2 and netperf higher bandwidth for small messages is that iperf3 is sending and receiving the same message size, while in iperf2, and probably netperf, they are different. For example, iperf2 default receive TCP message size is 128K.

Notes:

  1. Receiving all the burst messages as one message is assumed to be o.k. (and not "cheating"), based of iperf2 (and netperf) behavior.
  2. Since read is waiting to full messages (with timeout), when test is limited by bytes/block count or file size (-n, -k or --file) are set, read may wait because bytes sent are not multiple of message size. Therefore, when these parameters are set, read size is only -l value is read. Future enhancement may be that read will not wait for the full message (like in iperf2), and count received blocks based on number of bytes received.
  3. The TCP receive message size is extended (by multiplying blksize by the burst size) to maximum MAX_BLOCKSIZE (1MB). If this is too large, it may be limited to MAX_TCP_BUFFER (512KB)or DEFAULT_TCP_BLKSIZE (128KB).
  4. One of the advantages of using the sent burst size for the received message is that sending and receiving are in sync. However, other approaches may be used, e.g. adding additional value to -l with the receiver message length, i.e. the first value is for the sender size (and is the receiver default). This will be similar to the iperf2 approach, but I believe that using burst is better.
  5. There is an issue with using the burst size, which actually already exists in iperf3 before this PR. When setting the the test bandwidth using -b, the default burst size is set to 1. Assume that the network bandwidth is 1Gbps. For test that does not set -b the burst size is 10, and for test with -b10G the burst size is 1. Therefore, the first test will have higher bandwidth, although practically the second test didn't put a real limit to the bandwidth. A workaround is also setting the burst size by -b10G/10, but still users may not understand this burst size difference. (By the way, with multi-thread the sending burst may be redundant, so it may be possible to remove the burst loop when sending. With this approach, burst size will only be applicable to TCP receives, maybe allowing to not changing the default burst to 1 when -b is set.)

davidBar-On avatar Apr 30 '24 06:04 davidBar-On

Thank you for the pull request! We've been looking into this, and we have some questions about the methodology. When we're taking performance measurements for different message sizes, it's important to consider if we're pushing bytes as quickly as possible or measuring the performance of the entire system with sending and receiving these differently-sized messages. We tend to prefer the latter. 

On the other hand, if iperf3 is limiting the performance due to implementation inefficiencies, that is something we would like to address. For example, we removed most of the select calls from the send and receiving loops, since they were interfering with the measurement.

We're not sure if increasing the size of these messages by receiving and reading them as larger blocks reflects the measurements we would like to take. 

swlars avatar May 24 '24 17:05 swlars

Few comments that may help the evaluation:

it's important to consider if we're pushing bytes as quickly as possible or measuring the performance of the entire system with sending and receiving these differently-sized messages. We tend to prefer the latter.

The suggested change is only in the receiving side. There is no change to the way the bytes a pushed. If I understood iperf2 code correctly, this is how it works, i.e. sending each message separately but receiving bytes as they arrive. The approach is also similar to the SKIP-RX-COPY approach that the overhead of getting the messages at the receiving side may be ignored.

For example, we removed most of the select calls from the send and receiving loops, since they were interfering with the measurement.

As I mentioned in the description, because of this change the burst size has no meaning practically. Therefore, it seems that burst lost its meaning and that the default burst size can be changed to 1 with no impact on iperf3 performance. If the default burst will be changed to 1, this PR will not change the default iperf3 behavior (because the change is to read "burst" number of messages). Therefore, the burst value may change its meaning to its use by this PR. If this approach is desired I can enhance the PR code accordingly.

davidBar-On avatar May 25 '24 08:05 davidBar-On