firecracker
firecracker copied to clipboard
Support reset for `net` device
Changes
This PR implements support for resetting the net device. Resolves #3074.
Reason
The VirtIO spec says in section 3.1.1:
The driver MUST follow this sequence to initialize a device:
- Reset the device.
Operating systems like FreeBSD and NetBSD reset the device as part of initializing it, which currently causes Firecracker to put the device into a FAILED state. As noted in #3074, some guest OSes continue to have functional networking because they're unaware that the device is in a failed state, but we shouldn't rely on this behavior.
Notes
- This is partly an RFC because I'm not familiar enough with the vmm code to be confident that the device is being reset properly. Still, this does work in my Linux and FreeBSD test cases. Linux doesn't seem to reset the net device. FreeBSD does reset it, and then the device is activated and becomes usable.
reset()returns a copy of the interrupt FD and queue event FDs, but the MMIO driver does not use them. It's unclear to me how they should be used. Is that handshake only needed for specific drivers? If there were a way to signal that reset had succeeded without needing to clone the FDs, then I could remove the seccomp changes. Perhaps the return type could beResult<Option<...>>instead.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.
PR Checklist
- [X] If a specific issue led to this PR, this PR closes the issue.
- [X] The description of changes is clear and encompassing.
- [X] Any required documentation changes (code and docs) are included in this PR.
- [ ] API changes follow the Runbook for Firecracker API changes.
- [ ] User-facing changes are mentioned in
CHANGELOG.md. - [X] All added/changed functionality is tested.
- [ ] New
TODOs link to an issue. - [X] Commits meet contribution quality standards.
- [X] This functionality cannot be added in
rust-vmm.
Add two reviewers to your pull request (a maintainer will do that for you if you're new).
@kalyazin @sudanl0 👋
Codecov Report
Attention: Patch coverage is 70.58824% with 5 lines in your changes are missing coverage. Please review.
Project coverage is 81.51%. Comparing base (
43aae6e) to head (a3f3ad9). Report is 210 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| src/vmm/src/devices/virtio/mmio.rs | 33.33% | 4 Missing :warning: |
| src/vmm/src/devices/virtio/mod.rs | 0.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #4389 +/- ##
==========================================
- Coverage 81.52% 81.51% -0.01%
==========================================
Files 241 241
Lines 29296 29308 +12
==========================================
+ Hits 23883 23891 +8
- Misses 5413 5417 +4
| Flag | Coverage Δ | |
|---|---|---|
| 4.14-c5n.metal | 78.79% <70.58%> (-0.01%) |
:arrow_down: |
| 4.14-c7g.metal | ? |
|
| 4.14-m5d.metal | ? |
|
| 4.14-m5n.metal | 78.78% <70.58%> (-0.01%) |
:arrow_down: |
| 4.14-m6a.metal | 77.90% <70.58%> (-0.01%) |
:arrow_down: |
| 4.14-m6g.metal | 76.86% <70.58%> (-0.01%) |
:arrow_down: |
| 4.14-m6i.metal | 78.77% <70.58%> (-0.01%) |
:arrow_down: |
| 4.14-m7g.metal | 76.86% <70.58%> (-0.01%) |
:arrow_down: |
| 5.10-c5n.metal | 81.48% <70.58%> (-0.01%) |
:arrow_down: |
| 5.10-c7g.metal | ? |
|
| 5.10-m5d.metal | ? |
|
| 5.10-m5n.metal | 81.47% <70.58%> (-0.01%) |
:arrow_down: |
| 5.10-m6a.metal | 80.68% <70.58%> (-0.01%) |
:arrow_down: |
| 5.10-m6g.metal | 79.79% <70.58%> (-0.01%) |
:arrow_down: |
| 5.10-m6i.metal | 81.46% <70.58%> (-0.01%) |
:arrow_down: |
| 5.10-m7g.metal | 79.79% <70.58%> (-0.01%) |
:arrow_down: |
| 6.1-c5n.metal | 81.48% <70.58%> (-0.01%) |
:arrow_down: |
| 6.1-c7g.metal | ? |
|
| 6.1-m5d.metal | ? |
|
| 6.1-m5n.metal | 81.47% <70.58%> (-0.01%) |
:arrow_down: |
| 6.1-m6a.metal | 80.68% <70.58%> (-0.01%) |
:arrow_down: |
| 6.1-m6g.metal | 79.79% <70.58%> (-0.01%) |
:arrow_down: |
| 6.1-m6i.metal | 81.46% <70.58%> (-0.01%) |
:arrow_down: |
| 6.1-m7g.metal | 79.79% <70.58%> (-0.01%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Rebased. At least some of the build failures were from the branch being stale, I think. The rest ("failed to run custom build command") look like they might be related to seccomp, but there's not much output.
Hi @acj, thank you for your contribution, and sorry for taking a while to respond.
Rebased. At least some of the build failures were from the branch being stale, I think. The rest ("failed to run custom build command") look like they might be related to seccomp, but there's not much output.
I think this is because there is no readlink syscall on ARM, instead the seccomp entry needs to be readlinkat.
However, my team talked about this internally and I think we can avoid adding new syscalls. The callsite of reset does not make use of the returned file descriptors (and I'm also not sure what it could use them for), so I think we can simply change the signature of reset to return nothing, saving us the need to dup the fds (and thus the syscalls). cc @bchalios because he has more context on this.
@roypat Hi! No worries, and thanks for the feedback. Changing the reset signature sounds good.
I went with Result as the return type so that the existing reset behavior doesn't change too much, i.e. set_device_status will still mark the device as failed if reset isn't implemented.
@bchalios Thanks, that's very helpful. I'll take a closer look at the TAP handling.
It is not feasible for us to maintain FreeBSD artifacts for testing, so it would need to be done using Linux drivers. However, I haven't found a way to trigger a reset through Linux.
I think this might be an issue on the Firecracker side. My understanding of Linux's virtio implementation is crude at best, but it looks like it should reset the device during registration. I added logging code and verified that the Linux driver sets status to 0 during boot, prior to activation. But Firecracker ignores this because it only calls reset if the device is already activated. Do you think it would be okay to handle a reset even if the device is inactive?
Hey @acj. Really sorry for silence on this. We got context switched in other topics.
I think this might be an issue on the Firecracker side. My understanding of Linux's virtio implementation is crude at best, but it looks like it should reset the device during registration. I added logging code and verified that the Linux driver sets status to 0 during boot, prior to activation. But Firecracker ignores this because it only calls reset if the device is already activated. Do you think it would be okay to handle a reset even if the device is inactive?
It is. We don't support reset :) I had seen this reset there when I looked at resetting some time ago, but I'm not sure it is enough for testing that reset works (I might be convinced otherwise :)). Ideally, what we could do is load/unload the virtio-net module in the guest. But we explicitly don't want to use modules in the guest for safety reasons.