vagrant icon indicating copy to clipboard operation
vagrant copied to clipboard

Invalid state while waiting for it to boot on Ubuntu 22.04

Open smutel opened this issue 2 years ago • 8 comments

Debug output

https://gist.github.com/smutel/6d93ef01e8f7fdf17aad026891c1a695

Expected behavior

The VMs are started correctly and no error are reported by vagrant.

Actual behavior

The VM is starting when we are doing vagrant up with option gui = on but the status is still Stopped in the VirtualBox interface.

Reproduction information

Vagrant version

Vagrant 2.3.7

Host operating system

Ubuntu 22.04 (jammy)

Guest operating system

debian/bullseye64

Virtualbox version

VirtualBox 7.0.8

Steps to reproduce

  1. vagrant up

Vagrantfile

IMAGE_NAME = "debian/bullseye64"
MASTERS = 1
NODES = 2

Vagrant.configure("2") do |config|
    config.ssh.insert_key = false

    config.vm.provider "virtualbox" do |v|
        v.memory = 2048
        v.cpus = 1
    end

    (1..MASTERS).each do |i|
        config.vm.define "master-#{i}" do |master|
            master.vm.box = IMAGE_NAME
            master.vm.network "private_network", ip: "192.168.56.#{i + 100}"
            master.vm.hostname = "k8s-master-#{i}"

            # naming the virtualmachine
            master.vm.provider :virtualbox do |vb|
                vb.name = "k8s-master-#{i}"
            end

            master.vm.provision "file", source: "~/.ssh/id_rsa.pub", destination: "/tmp/id_rsa.pub"

            # change ansible to ansible_local if you are running from windows,
            # so that vagrant will install ansible inside VM and run ansible playbooks
            # eg: master.vm.provision "ansible_local" do |ansible|
            master.vm.provision "ansible_local" do |ansible|
                ansible.compatibility_mode = "2.0"
                ansible.playbook = "node-config.yml"
            end
        end
    end

    (1..NODES).each do |i|
        config.vm.define "node-#{i}" do |node|
            node.vm.box = IMAGE_NAME
            node.vm.network "private_network", ip: "192.168.56.#{i + 110}"
            node.vm.hostname = "k8s-node-#{i}"

            # naming the virtualmachine
            node.vm.provider :virtualbox do |vb|
                vb.name = "k8s-node-#{i}"
            end

            node.vm.provision "file", source: "~/.ssh/id_rsa.pub", destination: "/tmp/id_rsa.pub"

            # change ansible to ansible_local if you are running from windows,
            # so that vagrant will install ansible inside VM and run ansible playbooks
            # eg: node.vm.provision "ansible_local" do |ansible|
            node.vm.provision "ansible_local" do |ansible|
                ansible.compatibility_mode = "2.0"
                ansible.playbook = "node-config.yml"
            end
        end
    end
end

smutel avatar Jun 30 '23 12:06 smutel

In .vagrant/machines/master-1/virtualbox there is two files with UUID : id and index_uuid. The UUID of the VM started by vagrant seems to be in id file and the UUID of the VM created in VirtualBox seems to be in index_uuid file. So when vagrant use the UUID in index_uuid, he cannot get the status of the vm.

Anybody have more infos to give me to help me to find a workaround or to troubleshoot what's wrong ?

smutel avatar Jul 31 '23 10:07 smutel

Does not work with : https://releases.hashicorp.com/vagrant/2.3.7/vagrant_2.3.7_linux_amd64.zip

Work correctly with : https://releases.hashicorp.com/vagrant/2.3.7/vagrant_2.3.7-1_amd64.deb

smutel avatar Aug 01 '23 07:08 smutel

I have this issue already since several months, 6.1, 6.2, 6.3.

With any machine I try to start. Basically making it impossible to use vagrant with VirtualBox 7.0 <.<

dragetd avatar Sep 12 '23 19:09 dragetd

There seems to be an issue, if you reconfigure the location where your VMs are stored. It causes VirtualBox to report the VM state as stopped, eventho it is running.

You can check when the VM is running and run VBoxManage showvminfo xxxx-xxxxx-xxx-xxxx --machinereadable | grep -i state

This could be one of the reasons, I am still investigating.

dragetd avatar Sep 14 '23 11:09 dragetd

Okay, even with a fresh 7.0.10 install and my VMs under the default ~/.VirtualBox/Machines location, it still breaks.

Somehow, vagrant manages to creates VMs in some kind of limbo state. They are created, exist even to 'vboxmanage list vms', but are 'VMState: Powered Off', despite running.

This was introduced since 7.0.

While the VM running while being reported as 'VMState: Powered Off' does clearly look like some VirtualBox error, it is still a puzzle to me how vagrant creates VMs that are broken list his.

VBox has a concept of registering and unregistering VMs. It can create / clone a VM but not register it. I am not sure how this code calls the binary: https://github.com/hashicorp/vagrant/blob/main/plugins/providers/virtualbox/action/import.rb#L22

Now, if we make sure to get a --register into there somehow, it might help and make the VM actually appear properly for the rest of the tools. See https://www.virtualbox.org/manual/ch08.html#idm14413

Can we get this into vagrant somehow?

Edit

No, this is not the issue. The VM would register normally if the VBoxManage commands are called like vagrant does, but somehow they seem to do something different. Maybe an Env var or something else?

dragetd avatar Sep 25 '23 20:09 dragetd

@dragetd, thank you for looking into this!

For me, using a proper binary package (a Debian package, in my case) solved the problem.

When Vagrant was installed from either a Zip archive or via Homebrew (which I suppose is also done via unpacking the same archive) before, it wouldn't work. Then, I tried what @smutel suggested without having any expectations, and it turned out it worked.

That said, I haven't looked into why that is.

kwilczynski avatar Sep 26 '23 06:09 kwilczynski

@kwilczynski I now see why this fixed it for you. Switching to an (outdated) distribution package solved it as well for me.

It is a vagrant AppImage issue! Somehow AppImage is able to run VBoxManage in a different way that break VirtualBox, not properly registering the VM. Running the command manually works. AppImage doe not use any namespace/container techniques, so I am really not sure who to blame here. How VirtualBox can be run so it breaks itself or how vagrant manages to do so. xD

I created a VirtualBox issue with some more info: https://www.virtualbox.org/ticket/21889

dragetd avatar Oct 29 '23 11:10 dragetd

I've been wrestling with this issue for the better part of the year. I always assumed I had a misconfiguration or a missing kernel module somewhere. Using the package provided in the Gentoo repo solved this error for me, even though it's a downgraded version compared to what's offered on the Hashicorp site. I've now run into an unrelated error that may present another wall for me (https://github.com/hashicorp/vagrant/issues/12807), however the error referenced above was resolved by forgoing Hashicorp's AppImage.

PerennaSec avatar Nov 04 '23 15:11 PerennaSec