lxd Support VM disk resize without reboot (from Incus)

This PR adds support for resizing (growing) VM disks without rebooting, when using ZFS or LVM storage backends.

Resolves https://github.com/canonical/lxd/issues/13311.

Oct 04 '24 22:10 kadinsayani

Heads up @mionaalex - the "Documentation" label was applied to this issue.

Oct 07 '24 21:10 github-actions[bot]

What would prevent growing live the .raw file backing a QEMU on another storage driver? Or maybe that was left for another day/PR?

Oct 08 '24 18:10 simondeziel

What would prevent growing live the .raw file backing a QEMU on another storage driver? Or maybe that was left for another day/PR?

I'd like it if we could explore adding suppport for that, we support growing the raw disk file offline, so not sure if there is a reason we cant do it online?

Oct 09 '24 07:10 tomponline

Needs a rebase too please

Oct 09 '24 07:10 tomponline

@simondeziel @tomponline re: online disk resize

I don't see an issue with adding online disk resizing for ceph. RBD has an exclusive lock feature and supports online resizing with RBD client kernel > 3.10.

Oct 09 '24 16:10 kadinsayani

Thanks for checking on ceph RBD live resize capabilities! As Tom mentioned, we can already grow plain .raw file while offline so maybe we could do that live too now that there's a mechanism to notify QEMU about the bigger backing file.

Oct 09 '24 16:10 simondeziel

@tomponline rebased and good to go. Do we want to include support for live resizing ceph disks with this PR or open up a separate issue and save it for later?

Oct 14 '24 21:10 kadinsayani

@tomponline rebased and good to go. Do we want to include support for live resizing ceph disks with this PR or open up a separate issue and save it for later?

Lets try and do it as part of this PR. And then we can add a single API extension.

Oct 15 '24 09:10 tomponline

I've tested live resizing a Ceph RBD filesystem disk and it works as expected - it's just online resizing of Ceph RBD block volumes that doesn't work, which explains why I haven't been able to resize a Ceph backed rootfs.

Nov 06 '24 20:11 kadinsayani

It doesn't look like we'll be able to add support for online growing of Ceph RBD root disks. Ceph backed VM's have a read only snapshot which can't be updated when the root disk size is updated (see below). The snapshot is used for instance creation.

https://github.com/canonical/lxd/blob/9ac2433510c825833f260621b00fa0c19e6a6ff8/lxd/storage/drivers/driver_ceph_volumes.go#L1332-L1337

Furthermore, online resizing for Ceph volumes is generally considered unsafe in LXD:

https://github.com/canonical/lxd/blob/9ac2433510c825833f260621b00fa0c19e6a6ff8/lxd/storage/drivers/driver_ceph_volumes.go#L192-L205

Nov 06 '24 22:11 kadinsayani

Rebased and good to go.

In summary, we're adding support for online resizing (growing) of any zfs or lvm disks. Online resizing Ceph RBD filesystems was possible before the changes in this PR, but we've confirmed that online resizing of Ceph RBD block volumes is not possible due to the read only snapshot used during instance creation.

Nov 06 '24 22:11 kadinsayani

In summary, we're adding support for online resizing (growing) of any zfs or lvm disks. Online resizing Ceph RBD filesystems was possible before the changes in this PR, but we've confirmed that online resizing of Ceph RBD block volumes is not possible due to the read only snapshot used during instance creation.

zvols have a similar read-only snapshot as their origin, I guess it's an inherent limitation of how CoW is implemented in Ceph. Thanks for digging into it.

I'm now wondering what's up with dir and .raw files though. Them being raw files they don't have any CoW going on so I'd expect a live grow would work for them too.

Nov 06 '24 23:11 simondeziel

In summary, we're adding support for online resizing (growing) of any zfs or lvm disks. Online resizing Ceph RBD filesystems was possible before the changes in this PR, but we've confirmed that online resizing of Ceph RBD block volumes is not possible due to the read only snapshot used during instance creation.

zvols have a similar read-only snapshot as their origin, I guess it's an inherent limitation of how CoW is implemented in Ceph. Thanks for digging into it.

https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#layering seems to suggest it should just work:

A copy-on-write clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize cloned images. There are no special restrictions with cloned images.

But since you ran into issues, maybe we need to flatten those cloned images before growing them? https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#flattening-a-cloned-image

Nov 06 '24 23:11 simondeziel

Should we add a row for live VM disk resize in the storage driver features table? See: https://documentation.ubuntu.com/lxd/en/latest/reference/storage_drivers/#feature-comparison

Nov 07 '24 11:11 MusicDin

Should we add a row for live VM disk resize in the storage driver features table? See: https://documentation.ubuntu.com/lxd/en/latest/reference/storage_drivers/#feature-comparison

+1

Nov 07 '24 13:11 simondeziel

zvols have a similar read-only snapshot as their origin, I guess it's an inherent limitation of how CoW is implemented in Ceph. Thanks for digging into it.

https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#layering seems to suggest it should just work:

A copy-on-write clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize cloned images. There are no special restrictions with cloned images.

But since you ran into issues, maybe we need to flatten those cloned images before growing them? https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#flattening-a-cloned-image

Thanks for digging into this further :)

Given my initial research, your new findings, and what I've seen in the LXD codebase, I believe it is theoretically possible to online resize (grow) Ceph RBD block volumes, dir and .raw files.

I think I have some more work to do for this PR.

Nov 07 '24 17:11 kadinsayani

But since you ran into issues, maybe we need to flatten those cloned images before growing them? https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#flattening-a-cloned-image

I don't think flattening the cloned image is a safe approach. From the docs:

Since a flattened image contains all the data stored in the snapshot, a flattened image takes up more storage space than a layered clone does.

Nov 07 '24 17:11 kadinsayani

So although it is possible to online grow a Ceph RBD backed root disk, I found another problem:

When we create a Ceph RBD volume, a read only snapshot is created. This read only snapshot is used as the clone source for future non-image volumes. The read only or protected property of the snapshot is a precondition for creating RBD clones.

Nov 07 '24 17:11 kadinsayani

When we create a Ceph RBD volume, a read only snapshot is created. This read only snapshot is used as the clone source for future non-image volumes. The read only or protected property of the snapshot is a precondition for creating RBD clones.

That's initial image turned into a cloned read only snapshot really maps to my understanding of how it works with ZFS. Still not clear why/what's different with Ceph RBD volumes :/

Nov 07 '24 17:11 simondeziel

For reference, here is the error I'm getting after modifying the behaviour to allow for online growing the root disk, and adding a file system resize:

root@testbox:~# lxc config device set v1 root size=11GiB
Error: Failed to update device "root": Could not grow underlying "ext4" filesystem for "/dev/rbd0": Failed to run: resize2fs /dev/rbd0: exit status 1 (resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Bad magic number in super-block while trying to open /dev/rbd0)

Nov 07 '24 19:11 kadinsayani

For reference, here is the error I'm getting after modifying the behaviour to allow for online growing the root disk, and adding a file system resize:
root@testbox:~# lxc config device set v1 root size=11GiB
Error: Failed to update device "root": Could not grow underlying "ext4" filesystem for "/dev/rbd0": Failed to run: resize2fs /dev/rbd0: exit status 1 (resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Bad magic number in super-block while trying to open /dev/rbd0)

underlying "ext4" seems misleading as it seems to be code running in the host itself as operating on /dev/rbd0. Also, why would it do that? I'd expect only the VM's /dev/sda to be bigger, no partition touched, no FS resized.

Same for /dev/rbd0, shouldn't it just be bigger?

Nov 07 '24 19:11 simondeziel

I've updated the PR and the tests are good to go. I've opened a new issue to track adding support for Ceph RBD volumes.

Nov 13 '24 20:11 kadinsayani

I've updated the PR and the tests are good to go. I've opened a new issue to track adding support for Ceph RBD volumes.

I don't mind (too much) having this feature land in a per-driver fashion. However, I suspect/hope that Ceph is the special case here and all our other drivers would support live growing. I didn't hear back from you regarding the easy to test dir backend? Next, we'll need to consider Powerflex and the other driver that's still baking.

Nov 14 '24 17:11 simondeziel

I've updated the PR and the tests are good to go. I've opened a new issue to track adding support for Ceph RBD volumes.

I don't mind (too much) having this feature land in a per-driver fashion. However, I suspect/hope that Ceph is the special case here and all our other drivers would support live growing. I didn't hear back from you regarding the easy to test dir backend? Next, we'll need to consider Powerflex and the other driver that's still baking.

dir is not supported with the changes in this PR thus far. I'm working on adding support for it :)

@tomponline mentioned that Powerflex is out of scope for this PR.

Nov 15 '24 23:11 kadinsayani

@kadinsayani From what I can see this may also help with container live resizing (for both growing and shrinking) on block based drivers (i.e. lvm, ceph and zfs with volumes.zfs.block_mode enabled), as it currently is also not possible. I am also assuming this would not apply to ceph for the same reason we apparentely can't resize VMs on it. To what extent are these assumptions correct?

Nov 26 '24 02:11 hamistao

@kadinsayani From what I can see this may also help with container live resizing (for both growing and shrinking) on block based drivers (i.e. lvm, ceph and zfs with volumes.zfs.block_mode enabled), as it currently is also not possible. I am also assuming this would not apply to ceph for the same reason we apparentely can't resize VMs on it. To what extent are these assumptions correct?

Online shrinking is only possible for filesystem volumes. Online growing of block based drivers (zfs and lvm) will be possible for containers once this PR is merged (with volumes.zfs.block_mode enabled). Online growing of Ceph RBD block volumes is still under investigation, see https://github.com/canonical/lxd/issues/14462.

Nov 26 '24 15:11 kadinsayani

@kadinsayani can we close this for now until you get chance to look at this again?

Dec 09 '24 08:12 tomponline