kvm-guest-drivers-windows
kvm-guest-drivers-windows copied to clipboard
[netkvm] In the scenario of continuous transmission of small TCP packets, this may result in excessive paging, ultimately leading to packet loss in the network card.
Describe the bug In the scenario of sending small TCP packets, the Nagle algorithm is enabled by default in Windows, so the protocol stack combines the small packets into larger ones to be sent to the driver. When the packet reaches the driver layer, there are many fragments obtained through NdisMAllocateNetBufferSGList for one NET_BUFFER, but the network card has limitations (such as a maximum of 64 fragments), which causes some data in this packet to be lost.
To Reproduce Steps to reproduce the behaviour:
python code for example.
while True: random1 = random.randint(0, 99) random2 = random.randint(0, 9999) message = "{},{},{}".format(random1, '12233333', random2).encode('utf-8') sock.sendall(message) sock.close()
Expected behavior I think a fragmentation limit should be added at the driver level, such as 64. For NET_BUFFER with more than 64 fragments, the scattered packets should be reassembled into one page through a copy operation.
msdn says: Miniport drivers can optimize the transmission of small or highly fragmented packets by copying them to a preallocated buffer with a known physical address. This approach avoids mapping that is not required and therefore improves system performance.
https://learn.microsoft.com/en-us/windows-hardware/drivers/network/ndis-scatter-gather-dma
Screenshots If applicable, add screenshots to help explain your problem.
Host:
VM:
- Windows version Windows Server 2019
- Which driver has a problem Netkvm
- Driver version or commit hash that was used to build the driver any version after 2020.07.14
Additional context Add any other context about the problem here.
Hi @zjmletang ,
Thank you for opening the issue.
Some questions and comments:
- Are you using a real HW virtio-net implementation? Please help us to gain the motivation to add this feature
- Copying is not always more efficient; actually, in the case of larger packets, we definitely want to avoid it.
- In any case, the decision to copy will be based on the virtio-queue size, not just 64 SG fragments.
- Also, it might be more than one page. For Jumbo frames, we need more.
@ybendito Any thoughts?
Best regards, Yan.
- yes,we use HW virtio-net implementation. There is a limitation on the number of physical pages for a descriptor in our hardware
- I agree.
- I don't understand why it is related to the virtio-queue size. Could you please help explain?
- Yes, actually it may not necessarily be one page. What I meant to say was to limit the number of page divisions when converting a NET_BUFFER to descriptors, as long as we can achieve the desired effect.
- Do you mean "packet," no? The descriptor will point to one SG fragment (if not using indirect descriptors). Or do you mean you have a problem when the indirect feature is on?
- Without an indirect feature, one descriptor will point to each fragment. I thought the problem is to process a certain amount of descriptors because of the limited queue size. If the limitation is the amount of fragments per packet, I suggest adjusting the virtio spec so the guest will be aware of the limitation.
-
I mean i have a problem when the indirect feature is on. One descriptor has too many fragments so that it exceeds the hardware limitation.
-
yes, limitation is the amount of fragments per packet(as the code below). Before the protocol changes, would you consider adding a fixed hardware limitation, such as 64 or 256
In this function: SubmitTxPacketResult CTXDescriptor::Enqueue(CTXVirtQueue *Queue, ULONG TotalDescriptors, ULONG FreeDescriptors)
if (0 <= Queue->AddBuf(m_VirtioSGL, m_CurrVirtioSGLEntry, 0, this, m_IndirectArea.GetVA(), m_IndirectArea.GetPA().QuadPart)) { return SUBMIT_SUCCESS; }
m_VirtioSGL has too much fragments
- According to the spec: "The device limits the number of descriptors in a list through a transport-specific and/or device-specific value. If not limited, the maximum number of descriptors in a list is the virt queue size."
- I prefer to adjust the specification first. For general usage limiting the indirect descriptors will cause a performance hit. That means we will have to add non-standard and non-default feature.
ok,thank you very much!
@YanVugenfirer it is not a trivial change but I think it can be done with subsystem/subvendor id in the device (exactly as the spec allows) and with the driver that takes this id (and possible the revision ) in account. Need to check, of course, that the driver on Linux also will work correctly with this.
@YanVugenfirer and another option (probably requiring additional integration) is the vendor-specific configuration in the configuration chain, i.e. little more complicated but more flexible
@zjmletang Do you want to discuss our possible access to HW on emails? My email is yan (at) daynix.com
@YanVugenfirer ,Of course, I am very willing. My email is [email protected]
Hi @zjmletang and @YanVugenfirer ,
I am reproducing this issue and would like to show you the detailed steps. If there are any errors, please let me know. Thanks : )
Firstly, I would like to determine whether the problem exists between two virtual machines or between a virtual machine and a physical machine. Currently, my understanding is that a virtual machine runs this program as a client, and the server is a physical machine that only receives data. Is this correct?
The server code:
import socket
HOST = '<server_ip>'
PORT = 8001
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind((HOST, PORT))
s.listen()
while True:
conn, addr = s.accept()
with conn:
print('Connected by', addr)
while True:
data = conn.recv(1024)
if not data:
break
print(data)
conn.sendall(data)
The code in VM:
import random
import socket
while True:
random1 = random.randint(0, 99)
random2 = random.randint(0, 9999)
message = "{},{},{}".format(random1, '12233333', random2).encode('utf-8')
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('<server_ip>', 8001))
sock.sendall(message)
sock.close()
I have a question for @zjmletang: Where can I find the small TCP packets larger than 64?
Is it possible that they are here: random1, '12233333', random2
. Based on my testing, it seems that nothing is lost in either the VM or the Local.
The above is the test in the Local.
@heywji Did you try to reproduce the problem using virtio-net adapter on QEMU and Windows guest?
Hi @ybendito,
The version of qemu is
qemu-kvm-8.0.0-7.el9.x86_64
, the version of virtio-win is virtio-win-1.9.35-0.el9.iso
.
Here is my virtio-net-adapter part:
/usr/libexec/qemu-kvm -m 17408 \
-device '{"id": "pcie-root-port-3", "port": 3, "driver": "pcie-root-port", "addr": "0x1.0x3", "bus": "pcie.0", "chassis": 4}' \
-device '{"driver": "virtio-net-pci", "mac": "9a:44:d9:4f:bb:cf", "id": "idUyryQr", "netdev": "idxarPu0", "bus": "pcie-root-port-3", "addr": "0x0"}' \
-netdev '{"id": "idxarPu0", "type": "tap", "vhost": true, "vhostfd": "16", "fd": "12"}' \
Updates from RH on this: https://issues.redhat.com/browse/RHEL-18187