lxd icon indicating copy to clipboard operation
lxd copied to clipboard

`Unable to locate a VM UEFI firmware` at `lxd.activate` time

Open simondeziel opened this issue 10 months ago • 3 comments

Simple reproducer

root@v1:~# lxd activateifneeded --debug 2>&1
INFO   [2025-05-19T20:11:28Z] Instance type operational                     driver=lxc features="map[]" type=container
ERROR  [2025-05-19T20:11:28Z] Unable to run feature checks during QEMU initialization: Unable to locate a VM UEFI firmware 
WARNING[2025-05-19T20:11:28Z] Instance type not operational                 driver=qemu err="QEMU failed to run feature checks" type=virtual-machine
DEBUG  [2025-05-19T20:11:28Z] No need to start the daemon now

Original description

On amd64 machine with latest/edge and KVM supported and functional VMs, I get this warning:

# snap stop lxd; journalctl -f | grep -F 'lxd.activate[' & snap start lxd; lxc ls >/dev/null; kill %1
...
Feb 20 12:50:27 sdeziel-lemur lxd.activate[942949]: ==> Setting LXD user socket ownership
Feb 20 12:50:27 sdeziel-lemur lxd.activate[942949]: ==> Checking if LXD needs to be activated
Started.
Feb 20 12:50:27 sdeziel-lemur lxd.activate[943016]: time="2025-02-20T12:50:27-05:00" level=error msg="Unable to run feature checks during QEMU initialization: Unable to locate a VM UEFI firmware"
Feb 20 12:50:27 sdeziel-lemur lxd.activate[943016]: time="2025-02-20T12:50:27-05:00" level=warning msg="Instance type not operational" driver=qemu err="QEMU failed to run feature checks" type=virtual-machine

I wonder if it could be due to the snap only shipping a debug variant of the CODE:

$ find /snap/lxd/current/ -iname '*ovmf*'
/snap/lxd/current/share/qemu/OVMF_CODE.4MB.debug.fd
/snap/lxd/current/share/qemu/OVMF_CODE.4MB.fd
/snap/lxd/current/share/qemu/OVMF_VARS.4MB.fd
/snap/lxd/current/share/qemu/OVMF_VARS.4MB.ms.fd

And the code not looking for the debug.fd CODE: https://github.com/canonical/lxd/blob/main/lxd/instance/drivers/edk2/edk2.go#L182-L185

@tomponline on MM said:

we just need to disable that during activation check reallly

simondeziel avatar Feb 20 '25 21:02 simondeziel

I believe it's because here https://github.com/canonical/lxd-pkg-snap/blob/4d919878699c4ee5c3ff9ea73fe2e153dab15214/snapcraft/commands/daemon.activate#L112:

if ! "${LXD}" activateifneeded; then                      <<< HERE
    echo "====> Activation check failed, forcing activation"
    nsenter -t 1 -m systemctl start snap."${SNAP_INSTANCE_NAME}".daemon
fi

we call LXD binary without LXD_QEMU_FW_PATH environment variable set like we do in daemon.start.

mihalicyn avatar Feb 21 '25 11:02 mihalicyn

We should figure out why these checks are being done at all during activateifneeded stage, as I think they are unnecessary and undesirable at this stage.

There may be other checks that also dont need to happen at activateifneeded stage.

tomponline avatar Feb 21 '25 12:02 tomponline

I suspect its the calls to lxd.ConnectLXDUnix( in cmdActivateifneeded) Run which in turn hit /1.0 (because args.SkipGetServer is not true) in order to start the LXD server once its ascertained that LXD should start.

It may be possible to avoid this by not hitting /1.0 and instead hitting the LXD socket differently to trigger it to start without then serving /1.0.

But this is just a theory...

tomponline avatar Feb 21 '25 12:02 tomponline

@simondeziel does this still occur for you on latest/edge btw?

I couldnt re-create it.

tomponline avatar Jul 02 '25 08:07 tomponline

Whilst I couldn't re-create it but I can see the problem.

In cmdActivateIfNeeded.run there are 2 calls to d.State():

instance.LoadNodeAll(d.State(), instancetype.Any)
d.State().DB.Cluster.Transaction(context.TODO(), func(ctx context.Context, tx *db.ClusterTx)

Aside from the fact that is inefficient, the additional problem is that d.State() calls instanceDrivers.DriverStatuses() which in turn calls the Info() function for each instance driver and that does the qemu checks.

tomponline avatar Jul 02 '25 08:07 tomponline

@tomponline it took me a bit of fiddling with a fresh VM to reproduce but I still can:

$ lxc shell v3
root@v3:~# snap install lxd --channel latest/edge
2025-07-02T16:29:02Z INFO Waiting for automatic snapd restart...
lxd (edge) git-12a10e7 from Canonical✓ installed
root@v3:~# lxd activateifneeded --debug 2>&1
DEBUG  [2025-07-02T16:29:37Z] No local database, so no need to start the daemon now 
root@v3:~# lxd init --auto
root@v3:~# lxd activateifneeded --debug 2>&1
DEBUG  [2025-07-02T16:29:54Z] No global database, so no need to start the daemon now 
root@v3:~# lxd activateifneeded --debug 2>&1
DEBUG  [2025-07-02T16:29:57Z] No global database, so no need to start the daemon now 
root@v3:~# lxc ls
To start your first container, try: lxc launch ubuntu:24.04
Or for a virtual machine: lxc launch ubuntu:24.04 --vm

+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+
root@v3:~# lxd activateifneeded --debug 2>&1
DEBUG  [2025-07-02T16:30:00Z] No global database, so no need to start the daemon now 
root@v3:~# lxc init --empty c1
Creating c1
root@v3:~# lxd activateifneeded --debug 2>&1
DEBUG  [2025-07-02T16:30:08Z] No global database, so no need to start the daemon now 
root@v3:~# snap restart lxd
2025-07-02T16:30:19Z INFO Waiting for "snap.lxd.daemon.service" to stop.
Restarted.
root@v3:~# lxd activateifneeded --debug 2>&1
INFO   [2025-07-02T16:30:23Z] Instance type operational                     driver=lxc features="map[]" type=container
ERROR  [2025-07-02T16:30:23Z] Unable to run feature checks during QEMU initialization: Unable to locate a VM UEFI firmware 
WARNING[2025-07-02T16:30:23Z] Instance type not operational                 driver=qemu err="QEMU failed to run feature checks" type=virtual-machine
DEBUG  [2025-07-02T16:30:23Z] No need to start the daemon now   

simondeziel avatar Jul 02 '25 16:07 simondeziel