lxd icon indicating copy to clipboard operation
lxd copied to clipboard

BTRFS starting containers with quota too quickly

Open jamielsharief opened this issue 4 years ago • 6 comments

I am using LXD stable with a custom partition formatted using BTRFS, jobs such as creating a container or restoring from backup prematurely report status as complete, but if you try to carry out operation such as start container it fails.

If you create container, then patch it to set a quota for the disk size, you need to wait approx 15-20 seconds before you can start the container (without triggering errors). If you restore from a backup, this process is much longer.

Basically getting all kinds of quota errors, from setting permissions and like the one below

Failed preparing container for start: Failed to create file \"/var/snap/lxd/common/lxd/containers/ubuntu/backup.yaml\": open /var/snap/lxd/common/lxd/containers/ubuntu/backup.yaml: disk quota exceeded

and (any filename)

Failed preparing container for start: Failed to change ownership of: /var/snap/lxd/common/lxd/storage-pools/default/containers/itest/rootfs/usr/lib/apt/apt.systemd.daily

jamielsharief avatar Feb 15 '21 21:02 jamielsharief

It's not something we've ever noticed in any of our tests or have seen any report so far. LXD issues simple btrfs qgroup limit commands and waits for the command to return, there is no mention in the btrfs documentation of any of this being asynchronous so I'd be tempted to consider this a kernel bug.

Can you run nsenter --mount=/run/snapd/ns/lxd.mnt /snap/lxd/current/bin/btrfs qgroup show -pcreF /var/snap/lxd/common/lxd/containers/NAME to confirm the quota is immediately set by LXD and is of the correct value?

stgraber avatar Feb 16 '21 05:02 stgraber

It looks like it is, the moment i created container i ran this command for a 1GB quota. If i run sudo btrfs quota rescan /btrfs -w it does not even take 1 second. So the 15-30 second delay seems odd. Is LXD doing something in the background?

sudo nsenter --mount=/run/snapd/ns/lxd.mnt /snap/lxd/current/bin/btrfs qgroup show -pcreF /var/snap/lxd/common/lxd/containers/c1
qgroupid         rfer         excl     max_rfer     max_excl parent  child 
--------         ----         ----     --------     -------- ------  ----- 
0/629       416.18MiB     13.68MiB    953.67MiB         none ---     ---  

I did find this on the BTRFS wiki image

jamielsharief avatar Feb 16 '21 11:02 jamielsharief

LXD doesn't do anything in the background when it comes to storage handling.

It could be unaccounted data in the page cache which is why they suggest running sync. But that's not a viable solution as sync is a global system wide action which can take minutes and pretty much block I/O for everyone else...

If btrfs is doing some stuff in the background, they should provide a command line flag so we can block until whatever it's doing is done...

stgraber avatar Feb 16 '21 13:02 stgraber

This is how it can be recreated from the command line

$ lxc init ubuntu: c1
$ lxc config device add c1 root disk pool=default path=/ size=1GB && lxc start c1
Device root added to c1
Error: Failed preparing container for start: Failed to change ownership of: /var/snap/lxd/common/lxd/storage-pools/default/containers/c1/rootfs/usr/sbin/xfs_scrub
Try `lxc info --show-log c1` for more info

Doing this causes it hang, have to do a CTRL-c

$ lxc launch ubuntu:20.04 c2
$ lxc stop c2
$ lxc config device add c2 root disk pool=default path=/ size=1GB && lxc start c2

When i run the log on the hanging

$ lxc info --show-log c2
lxc c2 20210217100121.870 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1126 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.c2"
lxc c2 20210217100121.870 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1126 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.c2"
lxc c2 20210217100121.872 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1547 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc c2 20210217100121.910 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 30(dev)
lxc c2 20210217100121.929 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 33(full)
lxc c2 20210217100121.929 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 33(null)
lxc c2 20210217100121.929 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 33(random)
lxc c2 20210217100121.929 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 33(tty)
lxc c2 20210217100121.929 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 33(urandom)
lxc c2 20210217100121.929 ERROR    utils - utils.c:__safe_mount_beneath_at:1106 - Function not implemented - Failed to open 33(zero)

jamielsharief avatar Feb 17 '21 10:02 jamielsharief

If you think 1GB is too small, same happens with 5GB

j@lxd-btrfs:~$ lxc init ubuntu: c3
Creating c3
j@lxd-btrfs:~$ lxc config device add c3 root disk pool=default path=/ size=5GB && lxc start c3
Device root added to c3
Error: Failed preparing container for start: Failed to change ownership of: /var/snap/lxd/common/lxd/storage-pools/default/containers/c3/rootfs/usr/share/mime/audio/x-psf.xml
Try `lxc info --show-log c3` for more info
j@lxd-btrfs:~$ lxc info --show-log c3
Name: c3
Location: none
Remote: unix://
Architecture: aarch64
Created: 2021/02/17 10:22 UTC
Status: Stopped
Type: container
Profiles: default

Log:



jamielsharief avatar Feb 17 '21 10:02 jamielsharief

I think this issue is related to it, in the end it worked after x time, which is the same thing i have noticed. https://github.com/lxc/lxd/issues/4321

jamielsharief avatar Feb 17 '21 14:02 jamielsharief

@jamielsharief do you still see this as an issue in latest lxd?

tomponline avatar Oct 24 '22 19:10 tomponline