kitchen-vagrant
kitchen-vagrant copied to clipboard
TestKitchen refuses to destroy VMs that failed to finish the create step
Description
If something fails after a VM is created but still during the kitchen create
step, TestKitchen thinks the VM hasn't been created and will refuse to destroy it.
Kitchen Version
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% kitchen --version
Test Kitchen version 1.22.0
ChefDK Version
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% chef --version
Chef Development Kit Version: 3.1.0
chef-client version: 14.2.0
delivery version: master (6862f27aba89109a9630f0b6c6798efec56b4efe)
berks version: 7.0.4
kitchen version: 1.22.0
inspec version: 2.1.72
Platform Version
Arch Linux
Replication Case
Start with no running VMs:
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% kitchen list
Instance Driver Provisioner Verifier Transport Last Action Last Error
default-cub-rh7 Vagrant ChefZero CubBusser Sftp <Not Created> <None>
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% vboxmanage list vms | grep rh7
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]%
Then during an attempt to create a VM via TestKitchen, the create step fails after the VM comes up:
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% kitchen create
-----> Starting Kitchen (v1.22.0)
-----> Creating <default-cub-rh7>...
==> default: Checking for updates to 'cub_rh7'
default: Latest installed version: 7.5-13
default: Version constraints:
default: Provider: virtualbox
==> default: Box 'cub_rh7' (v7.5-13) is running the latest version.
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'cub_rh7'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'cub_rh7' is up to date...
==> default: Setting the name of the VM: default-cub-rh7_default_1537467634370_16029
==> default: Fixed port collision for 22 => 2222. Now on port 2200.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2200 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2200
default: SSH username: vagrant
default: SSH auth method: private key
==> default: Machine booted and ready!
==> default: Registering box with vagrant-registration...
Organization bogus does not exist.
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>> Failed to complete #create action: [Expected process to exit with [0], but received '1'
---- Begin output of vagrant up --provider virtualbox ----
STDOUT: Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'cub_rh7'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'cub_rh7' is up to date...
==> default: Setting the name of the VM: default-cub-rh7_default_1537467634370_16029
==> default: Fixed port collision for 22 => 2222. Now on port 2200.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2200 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2200
default: SSH username: vagrant
default: SSH auth method: private key
==> default: Machine booted and ready!
==> default: Registering box with vagrant-registration...
STDERR: Organization bogus does not exist.
---- End output of vagrant up --provider virtualbox ----
Ran vagrant up --provider virtualbox returned 1] on default-cub-rh7
>>>>>> ----------------------
>>>>>> Please see .kitchen/logs/kitchen.log for more details
>>>>>> Also try running `kitchen diagnose --all` for configuration
kitchen create 4.09s user 1.74s system 7% cpu 1:13.90 total
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]%
Consequently, the VM exists according to VirtualBox, but does not exist according to TestKitchen:
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% vboxmanage list vms | grep rh7
"default-cub-rh7_default_1537467634370_16029" {5ac5f267-f698-4dcf-89ec-6838d53d1d44}
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% kitchen list
Instance Driver Provisioner Verifier Transport Last Action Last Error
default-cub-rh7 Vagrant ChefZero CubBusser Sftp <Not Created> Kitchen::ShellOut::ShellCommandFailed
And even after a kitchen destroy
, the VM still exists:
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% kitchen destroy
-----> Starting Kitchen (v1.22.0)
-----> Destroying <default-cub-rh7>...
Finished destroying <default-cub-rh7> (0m0.00s).
-----> Kitchen is finished. (0m0.19s)
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% kitchen list
Instance Driver Provisioner Verifier Transport Last Action Last Error
default-cub-rh7 Vagrant ChefZero CubBusser Sftp <Not Created> <None>
[birdsnest ~/Work/scratchpad/cub_auth](master|✔)[I]% vboxmanage list vms | grep rh7
"default-cub-rh7_default_1537467634370_16029" {5ac5f267-f698-4dcf-89ec-6838d53d1d44}
I see the same behavior if create fails during kitchen test --destroy=always
as well.
I expect that kitchen create; kitchen destroy
or kitchen test --destroy=always
will always result in no running VMs leftover, but this behavior does not match that expectation.
The first step is making sure destroy double checks for orphans. Then I can look at perhaps creating another state as Vagrant still creates a VM despite failure - for the most part other drivers like cloud ones destroy the instance if anything bad happens during create. It's less that it refuses to destroy and more it doesn't know it has been created because it receives the error code back from Vagrant.
This might require work in core as well but that's after making destroy
more robust for Vagrant specifically.
It looks like this issue still exists. @cheeseplus are you still working on this or could I perhaps give it a try?
I am not working on it and I've been mostly out of the ecosystem for over a year so go nuts!
It's less that it refuses to destroy and more it doesn't know it has been created because it receives the error code back from Vagrant.
Maybe it's just that I have misplaced expectations, but I would expect an explicit kitchen destroy
to at least attempt a destroy regardless. What it does right now is not make an attempt because of a false assumption about the state of the VM.
It's a reasonable user facing expectation, I was speaking to the problem of an abstraction depending on another abstraction - at some point if you're spending more time trying to catch exceptions of an underlying abstraction then perhaps the model was inherently a poor choice.
The false assumption is due to the nature of how Vagrant fails and because kitchen has a limited window into introspecting information during that initial create window. Aside from catching and parsing the error from Vagrant, an alternative would be some relative path checking alongside vagrant global-status
parsing. This works but only limitedly - kitchen doesn't track the state of it's own config across multiple runs so moves, additions, or removals also fall through in terms of what kitchen knows to cleanup - mentioning this here for some illustration and illumination of how convoluted it can get when trying to do the "most reasonable" thing. Cloud integrations like ec2
are a bit easier to reason with as we're talking to an API and thus ask questions in a common language as opposed to shelling out which is also a notable part of the problem with the existing implementation.
This can be mitigated but at the cost of bolstering each plugin (vagrant in this case) with more discrete updates to it's state tracking. This is doable but as one who has chased a bunch of these types of issues in the ecosystem it requires either a) adding edgecases that are already mirrored in the underlying abstraction ad infinitum in each plugin or b) rethinking how TK manages state holistically.
Elaborating here because these are contributing factors as to why there has been little movement on issues like this - it's solvable (for some value of solvable) but it's diminishing returns and rabbit holes, especially in the case of the kitchen-vagrant abstraction.
Confirmed that this is still occurring. We wil take it into the backlog though we're also interested in community contributions to resolve this, as fixing this in a clean way is a larger undertaking.