Support VM disk resize without reboot (from Incus)
This PR adds support for resizing (growing) VM disks without rebooting, when using ZFS or LVM storage backends.
Resolves https://github.com/canonical/lxd/issues/13311.
Heads up @mionaalex - the "Documentation" label was applied to this issue.
What would prevent growing live the .raw file backing a QEMU on another storage driver? Or maybe that was left for another day/PR?
What would prevent growing live the
.rawfile backing a QEMU on another storage driver? Or maybe that was left for another day/PR?
I'd like it if we could explore adding suppport for that, we support growing the raw disk file offline, so not sure if there is a reason we cant do it online?
Needs a rebase too please
@simondeziel @tomponline re: online disk resize
I don't see an issue with adding online disk resizing for ceph. RBD has an exclusive lock feature and supports online resizing with RBD client kernel > 3.10.
Thanks for checking on ceph RBD live resize capabilities! As Tom mentioned, we can already grow plain .raw file while offline so maybe we could do that live too now that there's a mechanism to notify QEMU about the bigger backing file.
@tomponline rebased and good to go. Do we want to include support for live resizing ceph disks with this PR or open up a separate issue and save it for later?
@tomponline rebased and good to go. Do we want to include support for live resizing ceph disks with this PR or open up a separate issue and save it for later?
Lets try and do it as part of this PR. And then we can add a single API extension.
I've tested live resizing a Ceph RBD filesystem disk and it works as expected - it's just online resizing of Ceph RBD block volumes that doesn't work, which explains why I haven't been able to resize a Ceph backed rootfs.
It doesn't look like we'll be able to add support for online growing of Ceph RBD root disks. Ceph backed VM's have a read only snapshot which can't be updated when the root disk size is updated (see below). The snapshot is used for instance creation.
https://github.com/canonical/lxd/blob/9ac2433510c825833f260621b00fa0c19e6a6ff8/lxd/storage/drivers/driver_ceph_volumes.go#L1332-L1337
Furthermore, online resizing for Ceph volumes is generally considered unsafe in LXD:
https://github.com/canonical/lxd/blob/9ac2433510c825833f260621b00fa0c19e6a6ff8/lxd/storage/drivers/driver_ceph_volumes.go#L192-L205
Rebased and good to go.
In summary, we're adding support for online resizing (growing) of any zfs or lvm disks. Online resizing Ceph RBD filesystems was possible before the changes in this PR, but we've confirmed that online resizing of Ceph RBD block volumes is not possible due to the read only snapshot used during instance creation.
In summary, we're adding support for online resizing (growing) of any zfs or lvm disks. Online resizing Ceph RBD filesystems was possible before the changes in this PR, but we've confirmed that online resizing of Ceph RBD block volumes is not possible due to the read only snapshot used during instance creation.
zvols have a similar read-only snapshot as their origin, I guess it's an inherent limitation of how CoW is implemented in Ceph. Thanks for digging into it.
I'm now wondering what's up with dir and .raw files though. Them being raw files they don't have any CoW going on so I'd expect a live grow would work for them too.
In summary, we're adding support for online resizing (growing) of any zfs or lvm disks. Online resizing Ceph RBD filesystems was possible before the changes in this PR, but we've confirmed that online resizing of Ceph RBD block volumes is not possible due to the read only snapshot used during instance creation.
zvols have a similar read-only snapshot as their origin, I guess it's an inherent limitation of how CoW is implemented in Ceph. Thanks for digging into it.
https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#layering seems to suggest it should just work:
A copy-on-write clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize cloned images. There are no special restrictions with cloned images.
But since you ran into issues, maybe we need to flatten those cloned images before growing them? https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#flattening-a-cloned-image
Should we add a row for live VM disk resize in the storage driver features table? See: https://documentation.ubuntu.com/lxd/en/latest/reference/storage_drivers/#feature-comparison
Should we add a row for live VM disk resize in the storage driver features table? See: https://documentation.ubuntu.com/lxd/en/latest/reference/storage_drivers/#feature-comparison
+1
zvols have a similar read-only snapshot as their origin, I guess it's an inherent limitation of how CoW is implemented in Ceph. Thanks for digging into it.
https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#layering seems to suggest it should just work:
A copy-on-write clone of a snapshot behaves exactly like any other Ceph block device image. You can read to, write from, clone, and resize cloned images. There are no special restrictions with cloned images.
But since you ran into issues, maybe we need to flatten those cloned images before growing them? https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#flattening-a-cloned-image
Thanks for digging into this further :)
Given my initial research, your new findings, and what I've seen in the LXD codebase, I believe it is theoretically possible to online resize (grow) Ceph RBD block volumes, dir and .raw files.
I think I have some more work to do for this PR.
But since you ran into issues, maybe we need to flatten those cloned images before growing them? https://docs.ceph.com/en/reef/rbd/rbd-snapshot/#flattening-a-cloned-image
I don't think flattening the cloned image is a safe approach. From the docs:
Since a flattened image contains all the data stored in the snapshot, a flattened image takes up more storage space than a layered clone does.
So although it is possible to online grow a Ceph RBD backed root disk, I found another problem:
When we create a Ceph RBD volume, a read only snapshot is created. This read only snapshot is used as the clone source for future non-image volumes. The read only or protected property of the snapshot is a precondition for creating RBD clones.
When we create a Ceph RBD volume, a read only snapshot is created. This read only snapshot is used as the clone source for future non-image volumes. The read only or protected property of the snapshot is a precondition for creating RBD clones.
That's initial image turned into a cloned read only snapshot really maps to my understanding of how it works with ZFS. Still not clear why/what's different with Ceph RBD volumes :/
For reference, here is the error I'm getting after modifying the behaviour to allow for online growing the root disk, and adding a file system resize:
root@testbox:~# lxc config device set v1 root size=11GiB
Error: Failed to update device "root": Could not grow underlying "ext4" filesystem for "/dev/rbd0": Failed to run: resize2fs /dev/rbd0: exit status 1 (resize2fs 1.47.0 (5-Feb-2023)
resize2fs: Bad magic number in super-block while trying to open /dev/rbd0)
For reference, here is the error I'm getting after modifying the behaviour to allow for online growing the root disk, and adding a file system resize:
root@testbox:~# lxc config device set v1 root size=11GiB Error: Failed to update device "root": Could not grow underlying "ext4" filesystem for "/dev/rbd0": Failed to run: resize2fs /dev/rbd0: exit status 1 (resize2fs 1.47.0 (5-Feb-2023) resize2fs: Bad magic number in super-block while trying to open /dev/rbd0)
underlying "ext4" seems misleading as it seems to be code running in the host itself as operating on /dev/rbd0. Also, why would it do that? I'd expect only the VM's /dev/sda to be bigger, no partition touched, no FS resized.
Same for /dev/rbd0, shouldn't it just be bigger?
I've updated the PR and the tests are good to go. I've opened a new issue to track adding support for Ceph RBD volumes.
I've updated the PR and the tests are good to go. I've opened a new issue to track adding support for Ceph RBD volumes.
I don't mind (too much) having this feature land in a per-driver fashion. However, I suspect/hope that Ceph is the special case here and all our other drivers would support live growing. I didn't hear back from you regarding the easy to test dir backend?
Next, we'll need to consider Powerflex and the other driver that's still baking.
I've updated the PR and the tests are good to go. I've opened a new issue to track adding support for Ceph RBD volumes.
I don't mind (too much) having this feature land in a per-driver fashion. However, I suspect/hope that Ceph is the special case here and all our other drivers would support live growing. I didn't hear back from you regarding the easy to test
dirbackend? Next, we'll need to consider Powerflex and the other driver that's still baking.
dir is not supported with the changes in this PR thus far. I'm working on adding support for it :)
@tomponline mentioned that Powerflex is out of scope for this PR.
@kadinsayani From what I can see this may also help with container live resizing (for both growing and shrinking) on block based drivers (i.e. lvm, ceph and zfs with volumes.zfs.block_mode enabled), as it currently is also not possible. I am also assuming this would not apply to ceph for the same reason we apparentely can't resize VMs on it. To what extent are these assumptions correct?
@kadinsayani From what I can see this may also help with container live resizing (for both growing and shrinking) on block based drivers (i.e. lvm, ceph and zfs with
volumes.zfs.block_modeenabled), as it currently is also not possible. I am also assuming this would not apply to ceph for the same reason we apparentely can't resize VMs on it. To what extent are these assumptions correct?
Online shrinking is only possible for filesystem volumes. Online growing of block based drivers (zfs and lvm) will be possible for containers once this PR is merged (with volumes.zfs.block_mode enabled). Online growing of Ceph RBD block volumes is still under investigation, see https://github.com/canonical/lxd/issues/14462.
@kadinsayani can we close this for now until you get chance to look at this again?