vz: implement auto save/restore
- [x] Only
aarch64is supported due to lack of support inCode-Hex/vz/v3. - [x] Attempt to resume if the
vz-machine-statefile is found in the instance directory. If this fails, proceed with a cold boot. - [x] On stop, always attempt to suspend to the
vz-machine-statefile; if this fails, perform a shutdown. - [x] Guest OS and Guest Agent behavior are not considered here. Handling may be needed for suspending while port forwarding is active.
I will mark this as Ready for Review once the following are addressed:
- [x] Add
.saveOnStoptolimayaml. - [x]
limactl resume: Not needed. - [x]
limactl start: If the saved state is still present, restore. - [x] Add
limactl save: Request. saveOnStop = truetoha.sockand send SIGINT tohostagent. - [x]
limactl stop: Request. saveOnStop = falsetoha.sockand send SIGINT tohostagent. - [x]
hostagent: Upon receiving SIGINT, if. saveOnStopis true, save; if false, shutdown. - [x]
hostagent: Add an API accepting request to update. saveOnStoptoha.sock
Items to be addressed in a separate PR due to the potential evolution into a snapshot implementation:
- [ ]
limactl save: Iflima.yamlis changed since starting VM, promptlimactl stop, orlimactl rebootto apply changes to VM. - [ ]
hostagent: Use the samelima.yamlandcidata.isofor both save and restore.- on starting VM, keep
lima.yamlandcidata.isofor next restoring. - on restoring VM, use
lima.yamlandcidata.isokept on starting.
- on starting VM, keep
Future needs not included in this PR:
- [ ]
limactl reboot: TBD
~~This implements part of # 598.~~ I’ll withdraw the reference to the issue since it seems to be focused specifically on QEMU.
Using suspend/resume shortens the startup time of the Docker VM I use from 37 seconds to 13 seconds.
Do you have an issue describing the feature?
Did you look at the PR #642
Did you look at the PR #642
Yes, initially I started implementing this as an extension of #642, but since it didn’t align at all with the behavior of the API provided by Virtualization.framework, I implemented it from scratch.
I implemented it from scratch.
Ah, however, this PR does not implement limactl suspend or limactl resume. Instead, an option is added to limayaml that allows start/stop to perform resume/suspend actions.
You might want to add that as a separate feature, i.e. describe the auto-suspend
In addition to vz-machine-state, it might be necessary to save the cidata.iso and lima.yaml at the point of suspend for use during resume.
In addition to
vz-machine-state, it might be necessary to save thecidata.isoandlima.yamlat the point of suspend for use during resume.
And to apply the latest lima.yaml to the guest, a cold boot might be required.
to apply the latest
lima.yamlto the guest, a cold boot might be required.
limactl edit already requires the machine to be stopped.
to apply the latest
lima.yamlto the guest, a cold boot might be required.
limactl editalready requires the machine to be stopped.
I always edit lima.yaml in VSCode. 😝
Eventually we could add a
limactl reboot(orrestart) command because I think the only time you really want to shutdown the instance is when you want a fresh boot.
A restart command might be needed sooner than I expected. Currently, I restart using:
launchctl kickstart -kp gui/501/io.lima-vm.autostart.docker; tail -F ~/.lima/docker/{ha.stderr,launchd.stderr,serialv}.log
However, with launchctl, it ends up performing suspend/resume instead, making it impossible to actually restart.
I always edit
lima.yamlin VSCode. 😝
So how would you deal with this?
When you suspend, but the lima.yaml file has been edited more recently than the cidata.iso, then you will touch the lima.yaml file to make sure it is newer than the saved state.
And when you start, and the lima.yaml has been edited more recently than the vz-machine-state, then you will ask the user if they want to drop the suspended state?
I guess that works, but is maybe too magical?
Perhaps it would be better to shut down instead of suspending if the lima.yaml has been modified since startup?
And when you
start, and thelima.yamlhas been edited more recently than thevz-machine-state, then you will ask the user if they want to drop the suspended state?
Exactly. For that, I’m thinking of saving the lima.yaml and cidata.iso from the time of startup. Resume would always use those saved versions, while only a cold boot would use the latest lima.yaml.
Perhaps it would be better to shut down instead of suspending if the
lima.yamlhas been modified since startup?
Maybe, but you would lose all your running state/workloads, so shouldn't there be a prompt?
That's why limactl edit requires you to stop first.
Now that I think about it more, I think if you edit lima.yaml outside of limactl edit, then it is your responsibility to do a limactl reboot to trigger the changes to take effect. Lima itself should continue to suspend/resume without worrying about your changes.
Would it be better if limactl suspend detects changes to lima.yaml and prompts for either limactl stop or limactl reboot, instead of having hostagent detect it when receiving SIGINT?
Would it be better if
limactl suspenddetects changes tolima.yamland prompts for eitherlimactl stoporlimactl reboot
This does not work if the suspend happens in the background, e.g. because you are logging out. Same thing when you automatically resume. In both case you have no chance to prompt the user.
If the user edits lima.yaml without stopping the instance, then the user is responsible for eventually applying the changes by requesting a shutdown. We can ask when the instance is stopped or resumed, and stdin/stdout are both connected to a TTY. But otherwise the changes should be ignored.
Updated PR description.
Documentation is missing. It is better to start with documentation, before implementing. This helps to discuss the big picture, evaluate the user experience, and is much quicker to iterate on.
updated commits and PR description to aligning the terminology with libvirt:
- suspend -> save
- resume -> restore
.suspendOnSigInt->.saveOnStop
Why not use the same names as in the qemu implementation? (suspend/resume)
We already used "save" (and "load") for saving/loading a snapshot, internally that is.
Why not use the same names as in the qemu implementation?
Because its behavior differs from the QEMU implementation.
In the QEMU implementation, suspend does not save the VM state to disk, and resume does not restore the saved VM state.
Right, then it is a good thing to use different names. Actually the commands were called STOP and CONT in qemu.
For suspending to disk, the state is also called SUSPEND (standby/S3) or SUSPEND_DISK (hibernate/S4). My bad.
The changes to ensure that the same lima.yaml and cidata.iso are used for save/restore might potentially evolve into a snapshot implementation. Therefore, I would like to address those in a separate PR.
I've applied the review suggestions.
I made changes to past commits and force-pushed, so please fetch the latest and review again.
I force-pushed to remove a stray code block that had no effect.
I have other priorities to focus on, so I converted this to a draft for now.
Needs rebase
@AkihiroSuda How do Lima VMs created with the VZ driver behave when a M-series Mac suspends/sleeps and wakes up? Does Apple's VZ handle this automatically?
Are these save/restore commands discussed in this issue only needed to allow the user to save VM state manually? I'm assuming that suspending the laptop should not result in undefined behavior/corrupted VMs?