multipass icon indicating copy to clipboard operation
multipass copied to clipboard

Fix bugs of force stop

Open georgeliao opened this issue 1 year ago • 5 comments

close #3624 close #3606 The fix basically swallows the VMStateInvalidException throw from void mp::BaseVirtualMachine::check_state_for_shutdown(bool force) function. The new behavior of multipass delete (non-purge) is to do nothing if the VM state is suspending, suspended, starting, restarting. If the VM is running, the shutdown(purge) still applies.

The fix seems to be solid based on behavior. But it does get the exception catching and branching messy. If we want to treat the root cause of the bug, we have to restructure the code a bit. The root cause is that the check_state_for_shutdown function is essentially checking and throwing for the invocation of shutdown(force) in daemon::stop, but resides in shutdown function. As a result, it also throws the incorrect exception for daemon::delete.

So a more thorough fix would be moving the function void mp::BaseVirtualMachine::check_state_for_shutdown(bool force) to daemon::delete and daemon::stop. Both operations should have their respective state check. But this involves more code change. It is open for discussion.

georgeliao avatar Aug 05 '24 09:08 georgeliao

Codecov Report

Attention: Patch coverage is 62.79070% with 16 lines in your changes missing coverage. Please review.

Project coverage is 88.92%. Comparing base (5f4dfe2) to head (7d01280). Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/daemon/daemon.cpp 6.25% 15 Missing :warning:
.../platform/backends/shared/base_virtual_machine.cpp 88.88% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3625      +/-   ##
==========================================
+ Coverage   88.85%   88.92%   +0.06%     
==========================================
  Files         254      254              
  Lines       14269    14271       +2     
==========================================
+ Hits        12679    12690      +11     
+ Misses       1590     1581       -9     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 05 '24 10:08 codecov[bot]

Firstly, I don't think this is the proper way of fixing this. It's more of a hack than an actually fix. I think we need to go back to the question of "What is non-purge delete supposed to do?". Moving check_state_for_shutdown() to the daemon isn't the correct proper fix either in my opinion. Instead we should be evaluating what stop is supposed to do in the context of a delete of a suspended VM.

To answer the question "What is non-purge delete supposed to do?", let's delve into the expected behavior of each state.

Major states:

  1. Suspended: Shutdown should be skipped, and the VM is moved into deleted_instance. When the VM is recovered, the VM state is still suspended. That is what the current code does.
  2. Running: non-force shutdown is applied, the state changes to stopped and the VM is moved into deleted_instance. When the VM is recovered, the VM state is stopped. That is what the current code does.
  3. Stopped: Shutdown should be skipped, and the VM is moved into deleted_instance. When the VM is recovered, the VM state is still stopped. That is what the current code does.

So in summary, the catch exception and do nothing did the skip shutdown thing. In my opinion, this is a reasonable thing to do while the check_state_for_shutdown function is still in the void mp::VirtualMachine::shutdown(bool force) and it throws exception for both multipass stop and multipass delete. Feel free to share any other code design idea you might have.

There are other minor states like starting or restarting, these states on cmd blocks the daemon. The user will have to open another terminal to delete the starting VM. The ideal behavior would be waiting for the starting process finish and shut it down gracefully from the running state. However, I do not think we have good thread safety for this in the first place, so tackling these states would be beyond the scope of this PR.

georgeliao avatar Aug 28 '24 10:08 georgeliao

Secondly, it would be nice if the commit history got cleaned up a bit. Currently, only f34db7f is relevant to the final diff, so the rest of the commits can be dropped.

This can be done, sorry about that. Initially I was thinking about put two fixes into one PR.

georgeliao avatar Aug 28 '24 10:08 georgeliao

There are other minor states like starting or restarting, these states on cmd blocks the daemon. The user will have to open another terminal to delete the starting VM. The ideal behavior would be waiting for the starting process finish and shut it down gracefully from the running state. However, I do not think we have good thread safety for this in the first place, so tackling these states would be beyond the scope of this PR.

Hmm, I would not say that those states are out of scope, but I think we could simplify and not wait for a graceful shutdown either. Instead, multipass delete could still refuse to delete instances in those states (although with a different error message, not mentioning --force). The user could still use multipass delete --purge to force-shutdown and then delete. So, I think "suspended" is the only state where, to maintain behavior, we'd need to skip shutdown.

Here is an option to achieve that: mp::VirtualMachine::shutdown could receive an enum instead of a bool, to accept a third possible value: accepted_suspended (or a better name). So, on Daemon::delete_vm, we'd now call shutdown(purge ? OurEnum::force : OurEnum::accept_suspended). On stop, we'd call instead shutdown(force ? OurEnum::force : OurEnum::plain) or something of this sort.

ricab avatar Aug 28 '24 11:08 ricab

@sharder996 There are two items in email message that were edited later maybe. Regardless, just to clarify them

  1. Do nothing: check_state_for_shutdown() needs to change to not throw an exception.
  2. Put the machine in a stopped state: do the same thing as force stop.
  1. The throw is intended for multipass stop -> shutdown(force) and now it affects multipass delete -> shutdown(purge) we can not just simply change that throw.
  2. I think doing nothing and recovering it back to suspended is more reasonable, force stop is the one can change from suspended to stop state.

georgeliao avatar Aug 28 '24 12:08 georgeliao

This just occurred to me and I thought I'd note it down. We need to remember to test what happens when we pass multiple instance arguments or use --all. What happens when a VM in the wrong state and we do stuff like:

  • multipass delete a b c
  • multipass delete --all
  • multipass delete --all --purge
  • multipass stop --force a b c
  • multipass stop --all
  • multipass stop --all --force

Do we do the right thing in all those cases?

ricab avatar Aug 30 '24 18:08 ricab

This just occurred to me and I thought I'd note it down. We need to remember to test what happens when we pass multiple instance arguments or use --all. What happens when a VM in the wrong state and we do stuff like:

multipass delete vm1 vm2 vm3 multipass delete --all multipass delete --all --purge multipass stop --force vm1 vm2 vm3 multipass stop --all multipass stop --all --force Do we do the right thing in all those cases?

I changed the name of the VM to vm1, vm2 and vm3 for eaiser illustration.

It is thoughtful that you mentioned the bulk operations. This boils down to how the command behaves when one of the VM throws during this operation. Based on this line of code, it should exit on the first throwing VM and report the error to the cli and to the user. For example, if vm1 is running, vm2 is suspended and v,3 is stopped, then multipass stop vm1 vm2 vm3 should stop vm1 and throw on vm2 and leave vm3 intact. See the below command line output.

georgel@georgel-XPS-15:~/projects/canonical_projects/multipass_build/bin$
./multipass list
Name                    State             IPv4             Image
vm1                     Running           10.74.98.248     Ubuntu 24.04 LTS
vm2                     Suspended         --               Ubuntu 24.04 LTS
vm3                     Stopped           --               Ubuntu 24.04 LTS
georgel@georgel-XPS-15:~/projects/canonical_projects/multipass_build/bin$
./multipass stop vm1 vm2 vm3

stop failed: Cannot shut down suspended instance vm2.
Use --force to power it off.
georgel@georgel-XPS-15:~/projects/canonical_projects/multipass_build/bin$
./multipass list
Name                    State             IPv4             Image
vm1                     Stopped           --               Ubuntu 24.04 LTS
vm2                     Suspended         --               Ubuntu 24.04 LTS
vm3                     Stopped           --               Ubuntu 24.04 LTS

Regarding these three lines,

multipass delete --all --purge multipass stop --force vm1 vm2 vm3 multipass stop --all --force

They do not throw, so they can apply the operation to every VMs.

georgeliao avatar Sep 02 '24 13:09 georgeliao

Hi @georgeliao, thanks for clarifying. I did not quite understand your last comment though. Do you mean that those commands do not go through the line of code you linked to earlier?

In any case, I still think this should be manually tested, just to verify expectations.

ricab avatar Sep 02 '24 14:09 ricab

Hi @georgeliao, thanks for clarifying. I did not quite understand your last comment though. Do you mean that those commands do not go through the line of code you linked to earlier?

What I was trying to say is that things are simpler without throwing. The loop will not exit early. Every VM in that selection will get the operation applied or skipped. It is not that different from applying the operation to a single VM.

georgeliao avatar Sep 02 '24 14:09 georgeliao

In some cases it will still need to throw, right? Anyway, just trying to make sure the behavior is verified experimentally.

ricab avatar Sep 02 '24 14:09 ricab

@ricab @sharder996 , the follow-up codework is done based on our team discussion, so it is ready for another round of review. Please read the intro description first to get an overview of the steps and the functional testing cases.

georgeliao avatar Sep 03 '24 09:09 georgeliao

@georgeliao Just needs the private side now 😄

sharder996 avatar Sep 16 '24 14:09 sharder996