virtio-net enables features without checking if the peer supports them
Here we expose many features without confirming that the peer actually supports them:
let avail_features = 1 << VIRTIO_NET_F_GUEST_CSUM
| 1 << VIRTIO_NET_F_CSUM
| 1 << VIRTIO_NET_F_GUEST_TSO4
| 1 << VIRTIO_NET_F_HOST_TSO4
| 1 << VIRTIO_NET_F_GUEST_UFO
| 1 << VIRTIO_NET_F_HOST_UFO
| 1 << VIRTIO_NET_F_MAC
| 1 << VIRTIO_RING_F_EVENT_IDX
| 1 << VIRTIO_F_VERSION_1;
QEMU only enables checksum, TSO and UFO if the peer supports vnet_hdr. We should do the same thing and negotiate with the peer before enabling those features.
cc/ @mtjhrc
Looking in qemu code integrating with vmnet, it uses:
static NetClientInfo net_vmnet_shared_info = {
.type = NET_CLIENT_DRIVER_VMNET_SHARED,
.size = sizeof(VmnetState),
.receive = vmnet_receive_common,
.cleanup = vmnet_cleanup_common,
};
So net_vmnet_shared_info.has_vnet_hdr is NULL, so qemu_has_vnet_hdr() will always return False.
I think qemu can be improved to enable vment_enable_tso_key and vment_enable_checksum_offload_key, and in this case it can enable tso and tco in the virtio driver.
But this is the easy case when the same program controls both the virtio device and the vmnet interface. For libkrun we control the virtio device, but we don't know how the vmnet interface was configured. I think we need to add options to enable or disable offloading features.
When I enable vment_enable_tso_key and vment_enable_checksum_offload_key options in vmnet-helper, I can work with current krunkit/libkrun. But this give very inconsistent performance. The best way would be to be able to enable or disable offloading libkrun (via krunkit options), and in the program crating the vmnet interface. This way you can optimize for the use case.
Related libvirt options to control offloading: https://libvirt.org/formatdomain.html#setting-nic-driver-specific-options
vmnet offloading features (no public docs, documented in vmnet.h):
/*!
* @constant vmnet_enable_tso_key
* Enable TCP segmentation offload. Note, when this is enabled, the
* interface may generate large (64K) TCP frames. It must also
* be prepared to accept large TCP frames as well.
*/
extern const char *
vmnet_enable_tso_key API_AVAILABLE(macos(11.0)) API_UNAVAILABLE(ios, watchos, tvos);
/*!
* @constant vmnet_enable_checksum_offload_key
* Enable checksum offload for this interface. The checksums that are
* offloaded are: IPv4 header checksum, UDP checksum (IPv4 and IPv6),
* and TCP checksum (IPv4 and IPv6).
*
* In order to perform the offload function, all packets flowing in and
* out of the vmnet_interface instance are verified to pass basic
* IPv4, IPv6, UDP, and TCP sanity checks. A packet that fails any
* of these checks is simply dropped.
*
* On output, checksums are automatically computed as necessary
* on each packet sent using vmnet_write().
*
* On input, checksums are verified as necessary. If any checksum verification
* fails, the packet is dropped and not delivered to vmnet_read().
*
* Note that the checksum offload function for UDP and TCP checksums is unable
* to deal with fragmented IPv4/IPv6 packets. The VM client networking stack
* must handle UDP and TCP checksums on fragmented packets itself.
*/
extern const char *
vmnet_enable_checksum_offload_key API_AVAILABLE(macos(12.0)) API_UNAVAILABLE(ios, watchos, tvos);
Handling qemu dgram (-net dgram,) looks similar:
static NetClientInfo net_dgram_socket_info = {
.type = NET_CLIENT_DRIVER_DGRAM,
.size = sizeof(NetDgramState),
.receive = net_dgram_receive,
.cleanup = net_dgram_cleanup,
};
So with -net dgram, we don't enable offloading, and this indeed work with vmnet-helper (not enabling offloading by default).
Example devive that may enable offloading:
static NetClientInfo net_tap_info = {
.type = NET_CLIENT_DRIVER_TAP,
.size = sizeof(TAPState),
.receive = tap_receive,
.receive_iov = tap_receive_iov,
.poll = tap_poll,
.cleanup = tap_cleanup,
.has_ufo = tap_has_ufo,
.has_uso = tap_has_uso,
.has_vnet_hdr = tap_has_vnet_hdr,
.has_vnet_hdr_len = tap_has_vnet_hdr_len,
.set_offload = tap_set_offload,
.set_vnet_hdr_len = tap_set_vnet_hdr_len,
.set_vnet_le = tap_set_vnet_le,
.set_vnet_be = tap_set_vnet_be,
.set_steering_ebpf = tap_set_steering_ebpf,
};
Yes, this doesn't seem to be correct. It is quite likely we shouldn't be enabling these flags even for the passt backend actually. The virtio-net device is based on firecracker (which used a TAP), and I didn't really give a lot of thought to specifying the virtio features (mostly I wasn't sure what they were doing and it it worked, so I forgot about it).
So we should really remove the features, and have an API for enabling them (if they are useful or supported by the backend).
I did quick performance testing with and without TSO and checksum offload. It shows that in most case disabling offloading improves performance significantly and give more consistent results. However in one test (iper3 -R), enabling offloading is 5 times faster!
TSO and checksum offload enabled
I kept the fetures bits in libkrun and enabled vmnet offloading options.
| network | vm | iper3 | iperf3 -R |
|---|---|---|---|
| gvproxy | krunkit | 1.40 Gbits/s | 20.00 Gbits/s |
| vmnet-helper shared | krunkit | 1.38 Gbits/s | 46.20 Gbits/s |
| vmnet-helper bridged | krunkit | 1.37 Gbits/s | 45.70 Gbits/s |
| vmnet-helper shared | vfkit | 4.27 Gbits/s | 8.09 Gbits/s |
| vmnet-helper bridged | vfkit | 4.30 Gbits/s | 10.50 Gbits/s |
TSO and checksum offload disabled
I remove the fetures bits in libkrun and disable vmnet offloading options.
| network | vm | iper3 | iperf3 -R |
|---|---|---|---|
| gvproxy | krunkit | 1.47 Gbits/s | 2.58 Gbits/s |
| vmnet-helper shared | krunkit | 10.10 Gbits/s | 8.38 Gbits/s |
| vmnet-helper shared | krunkit | 10.10 Gbits/s | 8.38 Gbits/s |
| gvproxy | vfkit | 1.43 Gbits/s | 2.84 Gbits/s |
| vmnet-helper shared | vfkit | 10.70 Gbits/s | 8.41 Gbits/s |
| vmnet-helper bridged | vfkit | 12.10 Gbits/s | 11.50 Gbits/s |
I did not check any real workloads, maybe using offloading can improve performance when connecting 2 vms (possible with vmnet, not possible with gvproxy).
Test details
All tests used #263.
libkrun change to remvoe the features bits:
diff --git a/src/devices/src/virtio/net/device.rs b/src/devices/src/virtio/net/device.rs
index 62695c1..788cffb 100644
--- a/src/devices/src/virtio/net/device.rs
+++ b/src/devices/src/virtio/net/device.rs
@@ -93,15 +93,14 @@ pub struct Net {
impl Net {
/// Create a new virtio network device using the backend
pub fn new(id: String, cfg_backend: VirtioNetBackend, mac: [u8; 6]) -> Result<Self> {
- let avail_features = 1 << VIRTIO_NET_F_GUEST_CSUM
- | 1 << VIRTIO_NET_F_CSUM
- | 1 << VIRTIO_NET_F_GUEST_TSO4
- | 1 << VIRTIO_NET_F_HOST_TSO4
- | 1 << VIRTIO_NET_F_GUEST_UFO
- | 1 << VIRTIO_NET_F_HOST_UFO
- | 1 << VIRTIO_NET_F_MAC
- | 1 << VIRTIO_RING_F_EVENT_IDX
- | 1 << VIRTIO_F_VERSION_1;
+ let avail_features =
+ 1 << VIRTIO_NET_F_MAC | 1 << VIRTIO_RING_F_EVENT_IDX | 1 << VIRTIO_F_VERSION_1;
+ //1 << VIRTIO_NET_F_GUEST_CSUM
+ //| 1 << VIRTIO_NET_F_CSUM
+ //| 1 << VIRTIO_NET_F_GUEST_TSO4
+ //| 1 << VIRTIO_NET_F_HOST_TSO4
+ //| 1 << VIRTIO_NET_F_GUEST_UFO
+ //| 1 << VIRTIO_NET_F_HOST_UFO
@mtjhrc any thoughts on the API we want for enable offloading?
Can we use one flag to enable all offloading features or maybe more fine grain flags like libvirt? https://libvirt.org/formatdomain.html#setting-nic-driver-specific-options
@nirs
any thoughts on the API we want for enable offloading?
I'd say I'd prefer more fine grained flags, and accept them as a bitmask argument.
I'm not sure if the flags should be a part of e.g. krun_set_gvproxy_path2(uint32_t ctx_id, char *c_path, uint64_t flags) or a separate function e.g. krun_set_net_flags(uint32_t ctx_id, uint64_t flags). The second option looks cleaner for now (we also have krun_set_net_mac), but if we ever wanted more than one virtio-net device, that would get messy, so eventually we need to really have a high-level discussion of our API design.
any thoughts on the API we want for enable offloading?
I'd say I'd prefer more fine grained flags, and accept them as a bitmask argument. I'm not sure if the flags should be a part of e.g.
krun_set_gvproxy_path2(uint32_t ctx_id, char *c_path, uint64_t flags)
This seems wrong in many ways, assuming gvproxy when the same code supports other networking options (e.g. vment-helper), and mixing unrelated stuff (unix socket path and virtio feature flags), and assuming that networking is provided only by unix socket while it can be provided by passing file descriptors.
or a separate function e.g.
krun_set_net_flags(uint32_t ctx_id, uint64_t flags). The second option looks cleaner for now (we also havekrun_set_net_mac), but if we ever wanted more than one virtio-net device, that would get messy, so eventually we need to really have a high-level discussion of our API design.
Sound like the right way, but the flags should be per virtio device.
mixing unrelated stuff (unix socket path and virtio feature flags),
Yeah right the function name doesn't make sense at that point, so I should have picked a different name (e.g. krun_set_net_gvproxy()).
Sound like the right way, but the flags should be per virtio device.
Yes definitely, for now they could be global I guess. (because we only support 1 virtio-net). Eventually we might want to do something like we are discussing for virtio-console for all devices: https://github.com/containers/libkrun/pull/363#issuecomment-3007886448 But for it's not really necessary for now, but would be nice to clean it up eventually.
At this very moment I'm refactoring the network configuration API to support multiple network interfaces. I'm going to incorporate the ability to specify the offloading flags too.
This is the PR https://github.com/containers/libkrun/pull/367
@slp we can close as fixed by #367.