firecracker
firecracker copied to clipboard
[Bug] virtio-net: Guest can't NACK LRO
Describe the bug
When FreeBSD boots in Firecracker, it disables LRO features because it doesn't want to allocate 64 kB buffers. (Maybe this should change in FreeBSD, but that's how our virtio-net driver currently does things.)
Firecracker computes the acked_features
value but seems to completely ignore it(?); in any case, Firecracker assumes that the guest can handle >1536 byte segments despite FreeBSD disabling the relevant options. This results in the device hanging since Firecracker attempts (and fails) repeatedly to write a packet which doesn't fit onto the guest I/O ring.
To Reproduce
- Boot FreeBSD in Firecracker.
- Attempt to git clone a repository from github.
I'm not sure why, but github always seems to trigger this while I've never seen it with other networking. Possibly because github is very close to my test instance in us-east-1?
Expected behaviour
Firecracker should pay attention to the features accepted by the guest, and not send large segments if the guest has said it doesn't want them.
Firecracker should probably also drop packets rather than entering an infinite loop trying to write them into a buffer which they don't fit into. Obviously this isn't a trivial change since buffer space might open up later; I don't know enough about these bits to suggest the right solution here.
Environment
Firecracker @ 1ca6f8c90c683c48fc928836e0d607bc8a3855c9 plus PVH support patches. Linux 5.15 host; FreeBSD 14.0 guest; amd64 architecture.
Additional context
Checks
- [X] Have you searched the Firecracker Issues database for similar problems?
- [X] Have you read the existing relevant Firecracker documentation?
- [X] Are you certain the bug being reported is a Firecracker issue?
Related: #2433
Hi, Thanks for reporting this issue. This issue affect at the moment only FreeBSD guestOS. Linux mainline currently is not impacted but this does not exclude the possibility that will be impacted in the future. This request is valid, as next steps we need to design a solution applicable to Firecracker and working also with our supported OSes.
I've been experimenting with a FreeBSD guest for a GitHub Actions CI use case and ran into this as well. In case it helps anyone, here are two workarounds:
-
Patch Firecracker so that
MAX_BUFFER_SIZE
is in the range of MTU values that can be used with FreeBSD's virtio-net driver (<= 16k, I think). In/etc/rc.conf
, appendmtu 16384
(or whatever max buffer size you used) to theifconfig_vtnet0
line.With this option, I can get good throughput (10+ MB/s) on my home cable connection from sites like GitHub. Curiously, though, throughput from FreeBSD package mirrors is extremely low, often <200 KB/s. Downloading the same .pkg files from the host is much faster. Similar results in GitHub Actions runners.
Throughput improves somewhat when the guest's interface is configured with a larger MTU, perhaps because FreeBSD's driver is scaling its receive buffer sizes based on the MTU value.
-
After the guest has booted up, disable TCP segmentation offload (TSO) on the host with something like
ethtool -K tap0 tso off
. This needs to be done after each launch because it gets reenabled, even if I disable TSO4 in Firecracker's available features list.I haven't tested this option thoroughly yet, but throughput seems good when downloading from all hosts.
@bchalios I saw your post about this in slack:
We definitely assume that UFO & TSO4 will be accepted, so we assume that frames will always fit in the buffers will be provided by the guest. We can fix that
Would you accept a PR for this? I'm willing to work on a proper fix but wanted to ask before digging in.
Hey @acj, we had already started working on this. Should be able to post a PR in a week or so. I can ping you once it's out so you'll let me know what you think. Does that work for you?
This issue affect at the moment only FreeBSD guestOS. Linux mainline currently is not impacted but this does not exclude the possibility that will be impacted in the future.
I'm not really sure that it's not impacted. eBPF support (which is currently all the rage on Linux) seems to fail miserably because of this bug (and some others).
Hi @pkit. Could you please provide a reproducer for the issue. We do have some work in progress to fix this, just got de-prioritized due to other work. One thing missing was an integration test for making sure our fix works in Linux systems.
#4680 changes Firecracker to properly negotiate features with the guest. I will close this now, but please let us know if this problem persists.