containerd
containerd copied to clipboard
support quota set on overlay snapshot
support set quota on overlay snapshot
as discussed in issue #3329
Fixes https://github.com/containerd/containerd/issues/3329
enable quota set by follows:
- Make sure the root directory of containerd is mounted with 'pquota'
- Set 'overlay' as the default snapshot and enable quota in 'overlay' config
- Set the default quota size in CRI
config.toml like this
[plugins.'io.containerd.cri.v1.runtime']
enable_selinux = false
default_snapshot_quota_size = '2M' #+
...
[plugins.'io.containerd.snapshotter.v1.overlayfs']
root_path = ''
upperdir_label = false
sync_remove = false
slow_chown = false
mount_options = []
enable_quota = true #+
check containerd root mount info
# /etc/fstab
...
/dev/mapper/containerd_root /var/lib/containerd xfs defaults,pquota 0 0
# mount ...
/dev/mapper/containerd_root on /var/lib/containerd type xfs (rw,relatime,prjquota)
check container had been set quota
xfs_quota -x -c "report -h" /var/lib/containerd/
Signed-off-by: Yang Yang [email protected]
Hi @yylt. Thanks for your PR.
I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Build succeeded.
- containerd-build-arm64 : SUCCESS in 4m 52s (non-voting)
- containerd-test-arm64 : SUCCESS in 6m 12s (non-voting)
- containerd-integration-test-arm64 : SUCCESS in 24m 31s (non-voting)
Quota handling is a topic we should discuss at a high level first and possibly a good topic for a future community meeting. However, getting library support for setting project quotas across different backing FS makes sense to do now.
As an end feature, we need to discuss how it will be integrated with Kubernetes and how we can use it from our own tooling and go libraries. It seems backwards to have the cri layer here enable the quota then the snapshot set a static quota. I would think the quota would be set by the client adding a quota and the snapshotter enabling it via configuration and erroring out if the backing FS couldn't support it when enabled. The difficult part is figuring out from the client perspective whether quota is enabled so it knows whether it can avoid expensive ephemeral storage accounting (such as in the Kubelet).
The difficult part is figuring out from the client perspective whether quota is enabled so it knows whether it can avoid expensive ephemeral storage accounting (such as in the Kubelet).
In Kubernetes, there is a feature gate and a check method SupportsQuotas
, here is the KEP "Quotas for Ephemeral Storage" https://github.com/kubernetes/enhancements/issues/1029 from Kubernetes.
https://github.com/kubernetes/kubernetes/blob/226323178e23b4c476001266beab1e2f116b3879/pkg/volume/util/fsquota/common/quota_linux_common_impl.go#L180-L190
// SupportsQuotas determines whether the filesystem supports quotas.
func SupportsQuotas(mountpoint string, qType QuotaType) (bool, error) {
data, err := runXFSQuotaCommand(mountpoint, "state -p")
if err != nil {
return false, err
}
if qType == FSQuotaEnforcing {
return strings.Contains(data, "Enforcement: ON"), nil
}
return strings.Contains(data, "Accounting: ON"), nil
}
sorry for so long time later.
As an end feature, we need to discuss how it will be integrated with Kubernetes and how we can use it from our own tooling and go libraries. It seems backwards to have the cri layer here enable the quota then the snapshot set a static quota. I would think the quota would be set by the client adding a quota and the snapshotter enabling it via configuration and erroring out if the backing FS couldn't support it when enabled. The difficult part is figuring out from the client perspective whether quota is enabled so it knows whether it can avoid expensive ephemeral storage accounting (such as in the Kubelet).
it is good suggestion. now quota setter focus on overlay, and with pure go.
and update snapshots litter, add new option WithQuotaSize
. mostly if the option not define, will use default size in overlayfs
configurtion
such as:
[plugins."io.containerd.snapshotter.v1.overlayfs"]
root_path = ""
quota_size = "128MB"
I tried to build a containerd to do some test. Too many quota is created in my env.
[root@daocloud ~]# xfs_quota -xc "report -a -p -h" | head -n 100 Project quota on /var/lib/kubelet (/dev/sdc) Blocks Project ID Used Soft Hard Warn/Grace ---------- --------------------------------- #0 820K 0 0 00 [------] volume1048590 0 8E 8E 00 [------] volume1048605 0 8E 8E 00 [------] volume1048607 0 8E 8E 00 [------] volume1048610 0 8E 8E 00 [------] Project quota on /var/lib/containerd (/dev/sdb) Blocks Project ID Used Soft Hard Warn/Grace ---------- --------------------------------- #0 10.2G 0 0 00 [------] #2 0 0 0 00 [------] #3 0 15G 15G 00 [------] #4 12K 15G 15G 00 [------] #5 0 15G 15G 00 [------] #6 0 15G 15G 00 [------] #7 0 15G 15G 00 [------] #8 0 15G 15G 00 [------] #9 0 15G 15G 00 [------] #10 0 15G 15G 00 [------] #11 0 15G 15G 00 [------] #12 0 15G 15G 00 [------] #13 0 15G 15G 00 [------] #14 0 15G 15G 00 [------] #15 0 15G 15G 00 [------] #16 0 15G 15G 00 [------] #17 0 15G 15G 00 [------] #18 0 15G 15G 00 [------] #19 0 15G 15G 00 [------] #20 0 15G 15G 00 [------] #21 0 15G 15G 00 [------] #22 0 15G 15G 00 [------] #23 0 15G 15G 00 [------] #24 4K 15G 15G 00 [------] #25 0 15G 15G 00 [------] #26 0 15G 15G 00 [------] #27 0 15G 15G 00 [------] #28 0 15G 15G 00 [------] #29 4K 15G 15G 00 [------] #30 0 15G 15G 00 [------] #31 0 15G 15G 00 [------] #32 0 15G 15G 00 [------] #33 0 15G 15G 00 [------] #34 0 15G 15G 00 [------] #35 4K 15G 15G 00 [------] #36 4K 15G 15G 00 [------] #37 0 15G 15G 00 [------] #38 540K 15G 15G 00 [------] #39 0 15G 15G 00 [------] #40 0 15G 15G 00 [------] #41 0 15G 15G 00 [------] #42 0 15G 15G 00 [------] #43 0 10G 10G 00 [------] #44 0 10G 10G 00 [------] #45 0 10G 10G 00 [------] #46 0 10G 10G 00 [------] #47 0 10G 10G 00 [------] #48 0 10G 10G 00 [------] #49 0 10G 10G 00 [------] #50 0 10G 10G 00 [------] #51 0 10G 10G 00 [------] #52 0 10G 10G 00 [------] #53 0 10G 10G 00 [------] #54 0 10G 10G 00 [------] #55 0 10G 10G 00 [------] #56 0 10G 10G 00 [------] #57 0 10G 10G 00 [------] #58 0 10G 10G 00 [------] #59 0 10G 10G 00 [------] #60 0 10G 10G 00 [------] #61 0 10G 10G 00 [------] #62 0 10G 10G 00 [------] #63 0 10G 10G 00 [------]
I set the size to 10G at first and 15G later.
[root@daocloud ~]# xfs_quota -xc "report -a -p -h" | wc -l 1956507
yes, it becauce set quota committed layer too. but it is not expected
I got some new errors with latest code.
7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.299383030+08:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.299968488+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.overlayfs" error="Failed to set quota limit for projid 1 on /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev: function not implemented" 7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.300169800+08:00" level=warning msg="could not use snapshotter overlayfs in metadata plugin" error="Failed to set quota limit for projid 1 on /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev: function not implemented"
And the quota is not set as expected.
I got some new errors with latest code.
7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.299383030+08:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.299968488+08:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.overlayfs" error="Failed to set quota limit for projid 1 on /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev: function not implemented" 7月 06 14:57:03 paco containerd[10220]: time="2022-07-06T14:57:03.300169800+08:00" level=warning msg="could not use snapshotter overlayfs in metadata plugin" error="Failed to set quota limit for projid 1 on /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/backingFsBlockDev: function not implemented"
And the quota is not set as expected.
Hi, could you give some suggestion?
I try the block device disappeared, but it is not. and then, guess the GC on uintptr var
and result in failed, but it is not trueth.
I try command GOGC=off containerd -c myconfig.toml
, and it always successed, and use code below, but it does not work.
_, _, errno := syscall.Syscall6(syscall.SYS_QUOTACTL, uintptr(qcmd(Q_XSETQLIM, XFS_PROJ_QUOTA)),
(*[2]uintptr)(unsafe.Pointer(&backingFsBlockDev))[0], uintptr(projectID),
uintptr(unsafe.Pointer(&fd)), 0, 0)
and use bpftrace to debug, and the syscall _sys_quotactl
retval could be 0, -2 , -20.
bpftrace -e 'kretprobe:__x64_sys_quotactl {printf(%s, retval %d\n",kstack,retval)}'
finally, use below to translate string
to uintptr
could be work fine.
devbyte := append([]byte(backingFsBlockDev), 0)
_, _, errno := syscall.Syscall6(syscall.SYS_QUOTACTL, uintptr(qcmd(Q_XSETQLIM, XFS_PROJ_QUOTA)),
uintptr(unsafe.Pointer(&devbyte[0])), uintptr(fd.id),
uintptr(unsafe.Pointer(&fd)), 0, 0)
@yylt Do you have time to follow up on this? If not, I will work based on this PR later.
Feel free to close mine if you have time again.
@yylt Do you have time to follow up on this? If not, I will work based on this PR later.
Feel free to close mine if you have time again.
Yes, have time to complete the feature
But can you provide more guidance on the subsequent content aside from rebasing the branch, other than submitting the code, I'm not sure if more discussion is needed? @pacoxu
Yes, have time to complete the feature
Great.
But can you provide more guidance on the subsequent content aside from rebasing the branch, other than submitting the code, I'm not sure if more discussion is needed?
Rebase is needed at first.
I am not sure about the future of this feature. But we have similar requirements as well, users want a similar behavior like docker in this case.
- From kubernetes perspective, pod will be evicted if pod disk usage reached the limit in empty dir.
Questions we need to discuss about this feature:
- Collect clearer user stories: as I know, some users want a limited pod disk and the pod can work even it reached the limit.(something like a pod running in maintenance mode, and also avoid making node disk in pressure.) Some actions like backup/archive/save-checkpoint are needed before removing the pod with full disk.
- Should we support it in CRI? Then kubelet can use this feature to define the quota later. It can be a configuration per node or per pod, or even per container.
- Is this a containerd configuration? Will CRI overwrite the configuration?
- I am not sure if this was discussed in containerd meeting before. @dmcgowan @AkihiroSuda
- Is this the only solution? I heard that there is another proposal that is based on blockfile snapshotter to do similar limit for disk and IO.
Discussions should be in the issue IMO. So I commented in https://github.com/containerd/containerd/issues/759#issuecomment-1982610845.
The main implementation steps for now are as follows:
- The
snapshot
is used to enable quota. If it cannot be enabled, an error will be reported and the process will exit. -
CRI
is used to set the default quota size. When thesnapshot
supports quota configuration, quota will be set. When thesnapshot
does not support it, nothing will be done
@dmcgowan @pacoxu If there is more discussion or suggestions, I am ready to make modifications :)
@AkihiroSuda Hi, all code is pure golang, have any comments yet?
/retest
@yylt: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test
message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Is it necessary to use the new version of ackagep in this way to use fs magic? This looks quite ugly.
require github.com/containerd/continuity v0.4.4-0.20240406122502-56a67e3ba427
Is it necessary to use the new version of ackagep in this way to use fs magic? This looks quite ugly.
require github.com/containerd/continuity v0.4.4-0.20240406122502-56a67e3ba427
Can this be done in a seperate PR?
Can this be done in a seperate PR?
It doesn't seem appropriate, because the overlay
judgement of whether the file system supports setting quotas requires using magic fs
.
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.