firecracker
firecracker copied to clipboard
Allow snapshot tap changes
Changes
Allow renaming of tap devices on snapshot restore
Reason
In some scenarios it is not possible to use the jailer, especially in limited privilege environments where the security is external to firecracker itself. But in these cases a snapshot may have to use a different tap device than the one that it was using when it was snapshotted.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.
PR Checklist
- [X] If a specific issue led to this PR, this PR closes the issue.
- [X] The description of changes is clear and encompassing.
- [x] Any required documentation changes (code and docs) are included in this PR.
- [X] API changes follow the Runbook for Firecracker API changes.
- [x] User-facing changes are mentioned in
CHANGELOG.md. - [X] All added/changed functionality is tested.
- [X] New
TODOs link to an issue. - [X] Commits meet contribution quality standards.
- [X] This functionality cannot be added in
rust-vmm.
Hi @andrewla thank you for your contribution! We would like to understand the use case better in case it can be resolved through other means first. We recommend using a network namespace where you can create TAP devices with the same name, but that probably requires CAP_SYS_ADMIN, which I understand is what you mean with "limited privilege environments".
Could you elaborate on your use case? Is there a way you could create the namespace in a privileged setting and then use something like nsenter firecracker ...?
That assessment is correct -- basically to run the jailer in a network namespace you need the setns syscall which requires CAP_SYS_ADMIN. So nsenter is not an option.
Our particular case is running in a containerized environment where our privileges are limited by the nature of the general environment. Once we're in our particular container we have lost all relevant privileges.
Hi again @andrewla, we have been talking internally about this PR and we may need to spend some time to decide on the API aspects of it to make sure it doesn't conflict with other efforts.
In the meantime, we thought of another workaround. The snapshot-editor could be enhanced to rename the tap devices in an snapshot file. That would be an easier decision for us, but we want to make sure it would handle your use case.
For example we imagine the tool would work like this:
snapshot-editor edit-vmstate rename-network eth0 tap1
Would this work within your environment?
This was our initial approach as it required minimal changes. But we found that the performance cost of making the copy (as opposed to hardlinking) during the operation (plus serde costs) were more expensive than we were willing to tolerate in our environment.
Hi @pb8o -- is there anything we can do to help move this forward?
Hi @andrewla I haven't had time to look at this, but this is next on my list now. Thanks for your patience!
On a related note, another reason why renaming the tap device is a better approach than namespaced NAT from the "Network for Clones" guide is that the namespaced NAT imposes measurable overhead onto the host kernel due to the addition of about 5 more iptables/nft rules, plus an RTNETLINK route for forwarding the guest IP out of the netns.
Even though I made an effort to support namespaced NAT in fcnet, it increased complexity by a factor of 4-5x in comparison to regular NAT only to support one usecase: two simultaneous microVM clones. So I'd be in favor of this change, or a snapshot-editor equivalent.
Hello @andrewla ! I apologize for the long time between updates, but some other stuff came up. So we have decided to go ahead with this. I gave a first initial review and I only have some minor comments, but mostly looks good to me. I just have a question if the network_overrides field also works when starting from a JSON config file.
Re: config -- currently there is no config support for snapshots (https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/resources.rs) -- the snapshot configuration and restore has to be done with a running firecracker instance
Codecov Report
Attention: Patch coverage is 21.42857% with 11 lines in your changes missing coverage. Please review.
Project coverage is 83.14%. Comparing base (
4e9b215) to head (adf9d4a). Report is 3 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/vmm/src/persist.rs | 15.38% | 11 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #4731 +/- ##
==========================================
- Coverage 83.18% 83.14% -0.04%
==========================================
Files 248 248
Lines 26910 26923 +13
==========================================
+ Hits 22384 22386 +2
- Misses 4526 4537 +11
| Flag | Coverage Δ | |
|---|---|---|
| 5.10-c5n.metal | 83.53% <21.42%> (-0.04%) |
:arrow_down: |
| 5.10-m5n.metal | 83.51% <21.42%> (-0.04%) |
:arrow_down: |
| 5.10-m6a.metal | 82.71% <21.42%> (-0.04%) |
:arrow_down: |
| 5.10-m6g.metal | 79.56% <21.42%> (-0.04%) |
:arrow_down: |
| 5.10-m6i.metal | 83.51% <21.42%> (-0.04%) |
:arrow_down: |
| 5.10-m7g.metal | 79.56% <21.42%> (-0.04%) |
:arrow_down: |
| 6.1-c5n.metal | 83.58% <21.42%> (-0.03%) |
:arrow_down: |
| 6.1-m5n.metal | 83.56% <21.42%> (-0.03%) |
:arrow_down: |
| 6.1-m6a.metal | 82.75% <21.42%> (-0.04%) |
:arrow_down: |
| 6.1-m6g.metal | 79.56% <21.42%> (-0.04%) |
:arrow_down: |
| 6.1-m6i.metal | 83.55% <21.42%> (-0.05%) |
:arrow_down: |
| 6.1-m7g.metal | 79.56% <21.42%> (-0.04%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 New features to boost your workflow:
- ❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
It turns out that the test for renaming devices was failing when run with other tests that used network devices. After some experimentation, it seems that we are not cleaning up network devices from other tests, and modifying a network device results in an incompatible network configuration, rendering the VM unreachable.
For now I've patched this by having the new test use an unallocated network device, but I'm not sure if we're comfortable with this or if we want to try to figure out why the test passes when run alone but not when run in tandem with other tests.
I have applied the changes suggested by @pb8o. Also, I squashed all test commits to a single commit and I moved some code around in the appropriate commits.