msquic icon indicating copy to clipboard operation
msquic copied to clipboard

Linux datapath does not activate GRO although available

Open jospaeth-tum opened this issue 2 years ago • 10 comments

Describe the bug

When using the Linux datapath datapath_epoll.c on a system that supports both UDP GSO and GRO, the feature test in CxPlatDataPathCalculateFeatureSupport only activates send segmentation and falls back to "normal" recvmmsg for the receive side. Hardcoding the support for receive coalescing in Datapath->Features shows that GRO does indeed work on the system and significantly improves achievable goodput (5.2 GBit/s vs. 8.3 GBit/s during an 8 GiB file transfer on the test system described below).

The problem seems to be that no UDP_GRO control message is received after sending a message with GSO to the loopback interface. This is possibly related to https://github.com/microsoft/msquic/issues/3865, where the test using the loopback interface also lead to problems.

Affected OS

  • [ ] Windows
  • [X] Linux
  • [ ] macOS
  • [ ] Other (specify below)

Additional OS information

Debian 11 with Linux kernel 5.10.0-23-amd64

MsQuic version

v2.2

Steps taken to reproduce bug

  1. Use MsQuic's default choice for GSO/GRO support and evaluate performance during a file transfer
  2. Hardcode support for receive coalescing, e.g., add the following line here Datapath->Features |= CXPLAT_DATAPATH_FEATURE_RECV_COALESCING;
  3. Repeat the measurement and compare performance

Expected behavior

MsQuic should activate receive coalescing in both cases, leading to no performance differences.

Actual outcome

Performance increases significantly when receive coalescing is enforced.

Additional details

Test system:

  • Two machines running Debian 11 with Linux kernel 5.10.0-23-amd64
  • NICs are Intel E810-C running on ICE driver v1.12.6
  • GSO and GRO are activated for through ethtool for both the link to the other node and lo

jospaeth-tum avatar Oct 12 '23 10:10 jospaeth-tum

the feature test in CxPlatDataPathCalculateFeatureSupport only activates send segmentation and falls back to "normal" recvmmsg for the receive side.

What is failing in that function?

nibanks avatar Oct 12 '23 15:10 nibanks

What is failing in that function?

There is not really an error in the function, but it does not activate receive coalescing (on my test system) although the feature is supported. Thus, performance is worse than it could be. I noticed that this line is not reached, so the test with the loopback interface seems to not generate the UDP_GRO control message MsQuic is searching for.

jospaeth-tum avatar Oct 12 '23 15:10 jospaeth-tum

What is failing in that function?

There is not really an error in the function, but it does not activate receive coalescing (on my test system) although the feature is supported. Thus, performance is worse than it could be. I noticed that this line is not reached, so the test with the loopback interface seems to not generate the UDP_GRO control message MsQuic is searching for.

Without that, we cannot know how to segment a GRO receive. So if that's not getting delivered, we shouldn't use the feature.

I'll need your help to understand a bit more what's going on. Does GRO not work over loopback, but somehow work over your external NIC?

nibanks avatar Oct 12 '23 15:10 nibanks

What is failing in that function?

There is not really an error in the function, but it does not activate receive coalescing (on my test system) although the feature is supported. Thus, performance is worse than it could be. I noticed that this line is not reached, so the test with the loopback interface seems to not generate the UDP_GRO control message MsQuic is searching for.

Without that, we cannot know how to segment a GRO receive. So if that's not getting delivered, we shouldn't use the feature.

I'll need your help to understand a bit more what's going on. Does GRO not work over loopback, but somehow work over your external NIC?

Unfortunately, I'm not sure why the control message is not getting delivered in the loopback test. GRO does work with my external NIC and is also producing the correct control messages.

jospaeth-tum avatar Oct 12 '23 15:10 jospaeth-tum

Do you have a suggestion on how to handle this scenario then? AFAIU, there isn't a good way to know without trying and failing if GRO is supported. That's why we have the current logic, but it assumed support was interface agnostic. So we test loopback on start up to see if things work. If that doesn't work, but some other interfaces do, I'm not sure what to do. We can't easily test an external NIC without a peer.

nibanks avatar Oct 12 '23 15:10 nibanks

But support is not interface agnostic, is it? AFAIU, this was also the problem with this issue regarding GSO: https://github.com/microsoft/msquic/issues/3865

I have not tested this, but is the test for GRO really necessary? As long as the UDP_GRO macro is defined, the recv call should only coalesce messages if this is supported. Otherwise, it should only yield single messages.

jospaeth-tum avatar Oct 12 '23 15:10 jospaeth-tum

But support is not interface agnostic, is it? AFAIU, this was also the problem with this issue regarding GSO: #3865

I have not tested this, but is the test for GRO really necessary? As long as the UDP_GRO macro is defined, the recv call should only coalesce messages if this is supported. Otherwise, it should only yield single messages.

A possible problem is, however, that then we can't receive multiple messages with recvmmsg, as we don't fall back.

jospaeth-tum avatar Oct 12 '23 15:10 jospaeth-tum

I don't have a good solution, nor the ability to test all these different scenarios right now. Feel free to propose a PR if you have an idea on how to go forward, and we can let the automation try it out, and ask a few others to do so as well.

nibanks avatar Oct 12 '23 15:10 nibanks

I won't be able to look into that in the next days, but I will let you know if I find a solution :+1:

jospaeth-tum avatar Oct 12 '23 16:10 jospaeth-tum

Do you have a suggestion on how to handle this scenario then? AFAIU, there isn't a good way to know without trying and failing if GRO is supported. That's why we have the current logic, but it assumed support was interface agnostic. So we test loopback on start up to see if things work. If that doesn't work, but some other interfaces do, I'm not sure what to do. We can't easily test an external NIC without a peer.

In Tailscale we trust getsockopt() for UDP_GRO to confirm the kernel has support. If we don't get an error (EINVAL IIRC if a socket option is invalid) we proceed with setsockopt(). Technically we could condense that to just setsockopt() with no get, but broken apart is helpful for unrelated reasons. This is all to say, is the loopback testing necessary if it can lead to false negatives? I may be missing background/history on why loopback testing was chosen to start.

jwhited avatar Oct 13 '23 20:10 jwhited