runtime-tools
runtime-tools copied to clipboard
validation/linux_cgroups_*hugetlb: Use smaller limits
The previous values were giving me:
container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:367: setting cgroup config for procHooks
process caused \\\"failed to write 56892210544640 to hugetlb.1GB.limit_in_bytes: open /sys/fs/cgroup/hugetlb/.../hugetlb.1GB.limit_in_bytes: permission denied\\\"\""
The previous values are originally from 432615a0 (#93), which doesn't motivate their choice. The new values are copy/pasted from the spec (which doesn't motivate its choice either ;). I've kept something like @alban's comment from #605 to at least explain how the limit breaks down.
In testing with my local system, the issue seems to be pageSize and not the limit value. That seems to be supported by the kernel docs, which have:
hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot. hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages. On x86-64 and powerpc, this option can be specified multiple times interleaved with hugepages= to reserve huge pages of different sizes. Valid pages sizes on x86-64 are 2M (when the CPU supports "pse") and 1G (when the CPU supports the "pdpe1gb" cpuinfo flag).
My CPU supports both:
$ cat /proc/cpuinfo | grep '^flags' | head -n1 | grep -o ' \(pse\|pdpe1gb\) '
pse
pdpe1gb
but I don't set hugepagesz, and I seem to only get 2M by default. I can get 1GB entries by booting with hugepagesz=1GB. Longer-term, we may want to auto-detect the value(s) currently enabled by the host system, but for this commit I'm hard-coding 2MB.
I've just realized, this PR would conflict with https://github.com/opencontainers/runtime-tools/pull/637, which touches hugetlb as well. Though I'm fine with this PR being first merged. I can rebase again.
And it's interesting. On my local machine with Fedora 28, both page sizes 2MB and 1GB are available, so I'm able to run hugetlb tests without any issue. Though there must be other systems where 1GB is not available, so I think it's ok to set page size to 2MB.
On my local machine with Fedora 28, both page sizes 2MB and 1GB are available...
What do you get from:
$ cat /proc/cpuinfo | grep '^flags' | head -n1 | grep -o ' \(pse\|pdpe1gb\) ' pse
$ cat /proc/cmdline
@wking
# cat /proc/cpuinfo | grep '^flags' | head -n1 | grep -o ' \(pse\|pdpe1gb\) '
pse
pdpe1gb
# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.16.9-300.fc28.x86_64 root=UUID=e54de305-64fd-4729-8e65-dd68a8fd36fe ro rootflags=subvol=root rhgb quiet resume=/dev/nvme0n1p4 LANG=en_US.UTF-8
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.16.9-300.fc28.x86_64 root=UUID=e54de305-64fd-4729-8e65-dd68a8fd36fe ro rootflags=subvol=root rhgb quiet resume=/dev/nvme0n1p4 LANG=en_US.UTF-8
Hmm, I was expecting a hugepagesz entry. Maybe they've compiled in a default...