lxd
lxd copied to clipboard
No useful output from lxc commands after downgrade
Required information
- Distribution: Ubuntu
- Distribution version: 22.04
- The output of "lxc info" or if that fails:
- Kernel version: 5.15.0-1012-kvm
- LXC version: 4.0
- LXD version: 4.0
- Storage backend in use: dir
Issue description
If the database has been initialized, even if lxd init
has not been run, using "snap refresh" to downgrade will make most or all lxc commands (and lxd init) useless.
It seems like virtually any lxc command, not just lxd init, will initialize the database. Once it has been initialized, downgrading lxd will mean that the schema is invalid for that downgraded version. Having the wrong schema will prevent any communication on the socket. Commands will die with messages like Error: Get "http://unix.socket/1.0": EOF
What I expect:
- non-mutating operations like
lxc list
orlxd init --dump
do not have any observable side effects. In particular, they do not break downgrading. - when the DB schema is invalid, lxc commands produce an error message saying that the DB schema is invalid. (This implies that lxd starts up in a very limited mode when the DB schema is invalid, instead of falling over.) Ideally they say which version of LXD is compatible with the DB.
- when
/usr/bin/snap refresh --channel=4.0/stable lxd
would produce a broken configuration, it does nothing and exits with an error. Presumably there would be a--force
option to override.
Steps to reproduce
(recommended to do this in a vm launched with lxc launch --vm ubuntu:22.04)
- sudo /usr/bin/snap install --channel=5.0/stable lxd
- sudo lxc ls
- sudo /usr/bin/snap refresh --channel=4.0/stable lxd
- sudo lxc ls
Information to attach
- [ X ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
- [ X ] Output of the client with --debug
It would also be nice if there were a way to run lxd init
to recover from this situation, e.g. lxd init --overwrite --preseed
.
I'm not sure there's a lot we can do about this. The reason is that:
- There is no such thing as "initializing LXD",
lxd init
isn't special in any way, it just uses the normal REST API to setup storage, network, default profile, ... - The LXD database is needed for any of the API to function so it's automatically initialized on daemon startup
- Similarly schema updates must be applied prior to any DB access, so they're applied extremely early on startup
- LXD is socket activated on Ubuntu, so will start up when any
lxc
command is run or when anything else hits the unix socket - The
lxc
tool is just a REST API client, when the REST API isn't available because the daemon refused to start, it cannot connect and can't tell why
What we do to try and help with those situations is:
- A clear downgrade error should be visible in the LXD log (/var/snap/lxd/common/lxd/logs/lxd.log)
- A similar error should also be visible in
journalctl -u snap.lxd.daemon
- On DB upgrades, LXD makes a backup of the DB at /var/snap/lxd/common/lxd/database/global.bak, this can be restored should a downgrade be needed
@ru-fu we probably ought to add a doc page on the upgrade behavior that would more directly cover this
- There is no such thing as "initializing LXD",
lxd init
isn't special in any way, it just uses the normal REST API to setup storage, network, default profile, ...
Is there a reason why it can't be special? There's nothing to stop lxd init --overwrite from deleting the database, is there? Since it's not an lxc
command, it seems perfectly reasonable to support local-only operations.
- The LXD database is needed for any of the API to function so it's automatically initialized on daemon startup
Until there's mutation, you have a choice about whether to preserve the database after an operation.
- The
lxc
tool is just a REST API client, when the REST API isn't available because the daemon refused to start, it cannot connect and can't tell why
As I mentioned, supporting a sane error message implies that the daemon doesn't just fall over when the schema is invalid. You have the option of allowing the daemon to run with an invalid schema, and on every connection report that the schema is invalid.
Is there a reason why it can't be special? There's nothing to stop lxd init --overwrite from deleting the database, is there? Since it's not an
lxc
command, it seems perfectly reasonable to support local-only operations.
lxd init
is run as an unprivileged user and so doesn't have write access needed to wipe the database, nor does it now what init system you're using and how to restart LXD afterwards. So this kind of thing is actually better done by the user.
To properly reset LXD, the procedure usually is:
- rm -Rf /var/snap/lxd/common/lxd
- reboot
The reboot
part also takes care of wiping any kernel state, disks, networks, ... which may be in place as merely getting rid of the database doesn't handle that.
As I mentioned, supporting a sane error message implies that the daemon doesn't just fall over when the schema is invalid. You have the option of allowing the daemon to run with an invalid schema, and on every connection report that the schema is invalid.
LXD requires database access to setup the network listeners and requires the daemon config to be read from database to setup the API handlers. The easiest way we could do something like this would be to have a completely separate listener and API handler just for this one case.