lxd
lxd copied to clipboard
Support LVM storage pool unmount
Ever since we started to unmount storage pools on LXD shutdown (https://github.com/lxc/lxd/pull/9217) we have seen LVM errors on subsequent start up (although only in the test suite not reproducible locally) when using LVM on a loop file.
The errors we see are similar to:
EROR[09-22|23:53:40] Failed to start the daemon: Failed initializing storage pool "lxdtest-w7Q": Failed activating LVM thin pool volume "lxdtest-w7Q/LXDThinPool": Failed to run: lvchange --activate y --ignoreactivationskip lxdtest-w7Q/LXDThinPool: Activation of logical volume lxdtest-w7Q/LXDThinPool is prohibited while logical volume lxdtest-w7Q/LXDThinPool_tmeta is active.
Searching online this suggests the LVM has become corrupted somehow.
The only way I have managed to get the loop device to release itself with SetAutoclearOnLoopDev()
is by deactivating the thinpool volume with lvchange -an
or all volumes in the volume group using vgchange -an
. Which I had thought would be sufficient to ensure that volume group was deactivated cleanly.
So far approaches I have tried unsuccessfully to resolve this are:
During pool Unmount()
:
- Switch away from
releaseLoopDev()
and try the asyncSetAutoclearOnLoopDev()
instead, whilst waiting for the volume group to disappear. - Same as above but instead/as well as monitor the
/sys/class/block/loopN/loop/backing_file
and check is deleted to indicate the loop device is released. - Only deactivate the thinpool volume (with
lvchange -an
), and not all of the volumes (withvgchange -an
) as was previously happening. - As well as that call
sync
before callingSetAutoclearOnLoopDev()
to try and get the loop device to flush to the backing file. - Sleep 2 seconds at the end of
Unmount()
to avoid LVM subsystem races. - Using
losetup
rather thanopenLoopFile
andSetAutoclearOnLoopDev
.
During Mount()
:
- Wait for volume group and thin pool to appear after activating the loop file.
This doesn't happen on my local system (amd64) even when running the test suite on TMPFS, only on Jenkins.
One way I have found to reliably and quickly trigger the issue on Jenkins is to get the LVM Mount()
function to try and activate the LVM thinpool volume using lvchange --activate y --ignoreactivationskip
this should succeed, even if the thinpool is already active, but this quickly detects the problem storage pool issue.
Some earlier attempts:
https://github.com/lxc/lxd/pull/9276 https://github.com/lxc/lxd/pull/9274 https://github.com/lxc/lxd/pull/9267 https://github.com/lxc/lxd/pull/9258 https://github.com/lxc/lxd/pull/9253 https://github.com/lxc/lxd/pull/9254 https://github.com/lxc/lxd/pull/9247 https://github.com/lxc/lxd/pull/9245