lima icon indicating copy to clipboard operation
lima copied to clipboard

vz: implement auto save/restore

Open norio-nomura opened this issue 1 year ago • 27 comments

  • [x] Only aarch64 is supported due to lack of support in Code-Hex/vz/v3.
  • [x] Attempt to resume if the vz-machine-state file is found in the instance directory. If this fails, proceed with a cold boot.
  • [x] On stop, always attempt to suspend to the vz-machine-state file; if this fails, perform a shutdown.
  • [x] Guest OS and Guest Agent behavior are not considered here. Handling may be needed for suspending while port forwarding is active.

I will mark this as Ready for Review once the following are addressed:

  • [x] Add .saveOnStop to limayaml.
  • [x] limactl resume: Not needed.
  • [x] limactl start: If the saved state is still present, restore.
  • [x] Add limactl save: Request . saveOnStop = true to ha.sock and send SIGINT to hostagent.
  • [x] limactl stop: Request . saveOnStop = false to ha.sock and send SIGINT to hostagent.
  • [x] hostagent: Upon receiving SIGINT, if . saveOnStop is true, save; if false, shutdown.
  • [x] hostagent: Add an API accepting request to update . saveOnStop to ha.sock

Items to be addressed in a separate PR due to the potential evolution into a snapshot implementation:

  • [ ] limactl save: If lima.yaml is changed since starting VM, prompt limactl stop, or limactl reboot to apply changes to VM.
  • [ ] hostagent: Use the same lima.yaml and cidata.iso for both save and restore.
    • on starting VM, keep lima.yaml and cidata.iso for next restoring.
    • on restoring VM, use lima.yaml and cidata.iso kept on starting.

Future needs not included in this PR:

  • [ ] limactl reboot: TBD

~~This implements part of # 598.~~ I’ll withdraw the reference to the issue since it seems to be focused specifically on QEMU.

norio-nomura avatar Nov 13 '24 07:11 norio-nomura

Using suspend/resume shortens the startup time of the Docker VM I use from 37 seconds to 13 seconds.

norio-nomura avatar Nov 13 '24 07:11 norio-nomura

Do you have an issue describing the feature?

Did you look at the PR #642

afbjorklund avatar Nov 13 '24 14:11 afbjorklund

Did you look at the PR #642

Yes, initially I started implementing this as an extension of #642, but since it didn’t align at all with the behavior of the API provided by Virtualization.framework, I implemented it from scratch.

norio-nomura avatar Nov 13 '24 14:11 norio-nomura

I implemented it from scratch.

Ah, however, this PR does not implement limactl suspend or limactl resume. Instead, an option is added to limayaml that allows start/stop to perform resume/suspend actions.

norio-nomura avatar Nov 13 '24 15:11 norio-nomura

You might want to add that as a separate feature, i.e. describe the auto-suspend

afbjorklund avatar Nov 13 '24 15:11 afbjorklund

In addition to vz-machine-state, it might be necessary to save the cidata.iso and lima.yaml at the point of suspend for use during resume.

norio-nomura avatar Nov 15 '24 05:11 norio-nomura

In addition to vz-machine-state, it might be necessary to save the cidata.iso and lima.yaml at the point of suspend for use during resume.

And to apply the latest lima.yaml to the guest, a cold boot might be required.

norio-nomura avatar Nov 15 '24 05:11 norio-nomura

to apply the latest lima.yaml to the guest, a cold boot might be required.

limactl edit already requires the machine to be stopped.

jandubois avatar Nov 15 '24 05:11 jandubois

to apply the latest lima.yaml to the guest, a cold boot might be required.

limactl edit already requires the machine to be stopped.

I always edit lima.yaml in VSCode. 😝

norio-nomura avatar Nov 15 '24 06:11 norio-nomura

Eventually we could add a limactl reboot (or restart) command because I think the only time you really want to shutdown the instance is when you want a fresh boot.

A restart command might be needed sooner than I expected. Currently, I restart using:

launchctl kickstart -kp gui/501/io.lima-vm.autostart.docker; tail -F ~/.lima/docker/{ha.stderr,launchd.stderr,serialv}.log

However, with launchctl, it ends up performing suspend/resume instead, making it impossible to actually restart.

norio-nomura avatar Nov 15 '24 06:11 norio-nomura

I always edit lima.yaml in VSCode. 😝

So how would you deal with this?

When you suspend, but the lima.yaml file has been edited more recently than the cidata.iso, then you will touch the lima.yaml file to make sure it is newer than the saved state.

And when you start, and the lima.yaml has been edited more recently than the vz-machine-state, then you will ask the user if they want to drop the suspended state?

I guess that works, but is maybe too magical?

jandubois avatar Nov 15 '24 06:11 jandubois

Perhaps it would be better to shut down instead of suspending if the lima.yaml has been modified since startup?

norio-nomura avatar Nov 15 '24 06:11 norio-nomura

And when you start, and the lima.yaml has been edited more recently than the vz-machine-state, then you will ask the user if they want to drop the suspended state?

Exactly. For that, I’m thinking of saving the lima.yaml and cidata.iso from the time of startup. Resume would always use those saved versions, while only a cold boot would use the latest lima.yaml.

norio-nomura avatar Nov 15 '24 06:11 norio-nomura

Perhaps it would be better to shut down instead of suspending if the lima.yaml has been modified since startup?

Maybe, but you would lose all your running state/workloads, so shouldn't there be a prompt?

That's why limactl edit requires you to stop first.

Now that I think about it more, I think if you edit lima.yaml outside of limactl edit, then it is your responsibility to do a limactl reboot to trigger the changes to take effect. Lima itself should continue to suspend/resume without worrying about your changes.

jandubois avatar Nov 15 '24 06:11 jandubois

Would it be better if limactl suspend detects changes to lima.yaml and prompts for either limactl stop or limactl reboot, instead of having hostagent detect it when receiving SIGINT?

norio-nomura avatar Nov 15 '24 06:11 norio-nomura

Would it be better if limactl suspend detects changes to lima.yaml and prompts for either limactl stop or limactl reboot

This does not work if the suspend happens in the background, e.g. because you are logging out. Same thing when you automatically resume. In both case you have no chance to prompt the user.

If the user edits lima.yaml without stopping the instance, then the user is responsible for eventually applying the changes by requesting a shutdown. We can ask when the instance is stopped or resumed, and stdin/stdout are both connected to a TTY. But otherwise the changes should be ignored.

jandubois avatar Nov 15 '24 06:11 jandubois

Updated PR description.

norio-nomura avatar Nov 16 '24 02:11 norio-nomura

Documentation is missing. It is better to start with documentation, before implementing. This helps to discuss the big picture, evaluate the user experience, and is much quicker to iterate on.

nirs avatar Nov 16 '24 15:11 nirs

updated commits and PR description to aligning the terminology with libvirt:

  • suspend -> save
  • resume -> restore
  • .suspendOnSigInt -> .saveOnStop

norio-nomura avatar Nov 18 '24 00:11 norio-nomura

Why not use the same names as in the qemu implementation? (suspend/resume)

We already used "save" (and "load") for saving/loading a snapshot, internally that is.

afbjorklund avatar Nov 18 '24 06:11 afbjorklund

Why not use the same names as in the qemu implementation?

Because its behavior differs from the QEMU implementation.
In the QEMU implementation, suspend does not save the VM state to disk, and resume does not restore the saved VM state.

norio-nomura avatar Nov 18 '24 06:11 norio-nomura

Right, then it is a good thing to use different names. Actually the commands were called STOP and CONT in qemu.

For suspending to disk, the state is also called SUSPEND (standby/S3) or SUSPEND_DISK (hibernate/S4). My bad.

afbjorklund avatar Nov 18 '24 06:11 afbjorklund

The changes to ensure that the same lima.yaml and cidata.iso are used for save/restore might potentially evolve into a snapshot implementation. Therefore, I would like to address those in a separate PR.

norio-nomura avatar Nov 18 '24 11:11 norio-nomura

I've applied the review suggestions.
I made changes to past commits and force-pushed, so please fetch the latest and review again.

norio-nomura avatar Nov 19 '24 01:11 norio-nomura

I force-pushed to remove a stray code block that had no effect.

norio-nomura avatar Nov 19 '24 02:11 norio-nomura

I have other priorities to focus on, so I converted this to a draft for now.

norio-nomura avatar Nov 28 '24 00:11 norio-nomura

Needs rebase

AkihiroSuda avatar May 07 '25 05:05 AkihiroSuda

@AkihiroSuda How do Lima VMs created with the VZ driver behave when a M-series Mac suspends/sleeps and wakes up? Does Apple's VZ handle this automatically?

Are these save/restore commands discussed in this issue only needed to allow the user to save VM state manually? I'm assuming that suspending the laptop should not result in undefined behavior/corrupted VMs?

msimkunas avatar Jul 14 '25 18:07 msimkunas