lxd icon indicating copy to clipboard operation
lxd copied to clipboard

Modifying disk quotas causes unnecessary file attribute updates

Open mariusklocke opened this issue 1 year ago • 3 comments

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04 LTS
  • Storage Backend: dir Backend on XFS with project quotas enabled
  • The output of snap list --all lxd core20 core22 core24 snapd:
core20  20231123       2105   latest/stable  canonical✓  base,disabled
core20  20240111       2182   latest/stable  canonical✓  base
lxd     5.0.2-d4d8da9  26741  5.0/stable     canonical✓  disabled,held
lxd     5.0.3-51452c3  26881  5.0/stable     canonical✓  held
snapd   2.61.1         20671  latest/stable  canonical✓  snapd,disabled
snapd   2.61.2         21184  latest/stable  canonical✓  snapd

Issue description

Updating the root disk size in LXD takes longer for containers with many files, because every file in the root disk is touched. I observed cases where the instance update took more than 5 minutes. LXD updates the fsx_projid and fsx_xflags attributes of all "regular" files. These updates are no-op from my point of view and should be skipped. I cannot image a scenario where there would be a change for those attributes during the lifecycle of a container. Touching every file slows down instance updates although there are no actual changes to the file attributes. I think the code responsible is in here. I think it should be the same issue on EXT4 (not tested).

Steps to reproduce

  1. Create a storage backend with dir driver on a XFS volume with project quotas enabled
  2. Create a container with root disk size of 10GiB
  3. Check extended file attributes using xfs_quota -x -c 'stat' /mnt/containers/c-1/rootfs/etc/hosts
  4. Check ctime using stat /mnt/containers/u1/rootfs/etc/hosts
  5. Update container root disk size to 15GiB: lxc config device set c-1 root size="15GiB"
  6. Re-check extended file attributes and ctime: ctime has been updated, but no file attributes have changed

mariusklocke avatar Mar 12 '24 14:03 mariusklocke

If you agree to my conclusions: I already spotted an easy fix for this issue. The setProject function is only called from here. Some lines above there already is a check if the quota project ID has changed. I think the issue would be resolved, if the quota.setProject() call would be moved to into the if statement body above.

mariusklocke avatar Mar 13 '24 10:03 mariusklocke

Is there missing anything in my issue report? I would appreciate any kind of feedback.

I recently discovered, that altering every file also causes errors, when the container executes a process where files are deleted while the instance update in LXD is still executing:

Error: Failed to update device "root": lstat /var/snap/lxd/common/lxd/storage-pools/default/containers/my-awesome-container/rootfs/tmp/YmCbeF: no such file or directory

mariusklocke avatar Apr 10 '24 12:04 mariusklocke

Thanks for your report. We will endeavour to investigate this soon.

I'm not sure why the files are all modified, although I suspect its to do with ensuring all files take part in the quota. Perhaps there is a better way.

tomponline avatar Apr 10 '24 12:04 tomponline

I just made a PR for this. We encounter the "no such file or directory" error now regularly and have no other option than to retry, which is a bummer for an operation which can take more than 5 minutes. I think a bugfix should be made for LXD 5.21

mariusklocke avatar Jul 24 '24 11:07 mariusklocke

We encounter the "no such file or directory" error now regularly and have no other option than to retry, which is a bummer for an operation which can take more than 5 minutes.

A quick fix would be to have LXD detect this sort of error and continue rather than failing.

As for not updating project on all files, this will need more consideration I think.

tomponline avatar Jul 24 '24 11:07 tomponline

I guess that how "quick" that fix would depend on who does the coding. I am not a Go developer at all and re-arranging code to fix a logical error is something I can contribute quickly. Getting into how error handling in Go works, is something very different.

mariusklocke avatar Jul 24 '24 12:07 mariusklocke

See if https://github.com/canonical/lxd/pull/13815 helps once it lands in latest/edge.

To be clear im not rejecting your PR, but in isolation I don't believe it will prevent the "no such file or directory" errors you're getting because the error appears to be happening in quota.SetProject.

Admittedly the likely hood of it happening is reduced by your PR as LXD won't do two walks of the container's filesystem, first to remove the project and then to re-add the new one.

I think your PR could provide a speed up indeed, but I will need to consider it more carefully because I'd like to understand why it was done that way originally before changing it.

tomponline avatar Jul 24 '24 12:07 tomponline

I'm not entirely happy, but thanks for providing an alternative solution to the issue. My primary use case is all containers with all limited root disks. If I understand the handling of quota project IDs in LXD correctly, they should never actually change except there is a new volume ID for the root disk (don't know if that ever happens). Only then the "file walk" should be required. My PR would fix performance and also "not found" errors for my use case, because the "file walk" would never happen in my use case (maybe once on container creation?). But maybe there are other use cases.

Any chance to get your fix in the currently supported LTS versions? Still preparing the LXD 5.21 update.

mariusklocke avatar Jul 24 '24 13:07 mariusklocke

Only then the "file walk" should be required. My PR would fix performance and also "not found" errors for my use case, because the "file walk" would never happen in my use case (maybe once on container creation?). But maybe there are other use cases.

Understood, once my PR lands, if you can rebase your PR so its mergeable and then I'll review in more detail. It makes sense to have the file missing check there in any case.

Any chance to get your fix in the currently supported LTS versions? Still preparing the LXD 5.21 update.

Yeah ill backport this and it'll be in LXD 5.21.3.

tomponline avatar Jul 24 '24 13:07 tomponline

@tomponline I see that there have been backports to stable-5.21 in the last weeks, but the fix for this issue wasn't included?

mariusklocke avatar Sep 25 '24 12:09 mariusklocke

I've included it in https://github.com/canonical/lxd/pull/14185

tomponline avatar Oct 01 '24 09:10 tomponline

I can confirm the issue is fixed in LXD 5.21.3 currently available in snap channel 5.21/candidate. Now waiting for the patch release to reach 5.21/stable.

mariusklocke avatar Feb 05 '25 13:02 mariusklocke

Excellent, this will be going to 5.21/stable soon.

tomponline avatar Feb 05 '25 14:02 tomponline