k8s
k8s copied to clipboard
Add mem soft limit for Go 1.19 support
This makes Go GC use the memory limit that it is allocatable for the container instead the one set for the host. In case there is no resource limit set, then it would use by default what is allocatable according to the K8S node.
Result of running without resource limits set (t3.small 2.0GB mem)
# kubectl get nodes -o yaml
allocatable:
cpu: 1930m
ephemeral-storage: "76224326324"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 1507636Ki
# kubectl exec -it nats-mem-0 -- /bin/sh
$ free -m
total used free shared buff/cache available
Mem: 1948 594 140 1 1213 1234
Swap: 0 0 0
$ env
GOMEMLIMIT=1543819264
$ cat /proc/meminfo
MemTotal: 1995060 kB
MemFree: 144400 kB
MemAvailable: 1264464 kB
Buffers: 2700 kB
Cached: 1020520 kB
SwapCached: 0 kB
Active: 940764 kB
Inactive: 517396 kB
Active(anon): 368708 kB
Inactive(anon): 424 kB
Active(file): 572056 kB
Inactive(file): 516972 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 140 kB
Writeback: 0 kB
AnonPages: 434964 kB
Mapped: 185796 kB
Shmem: 1776 kB
KReclaimable: 218872 kB
Slab: 314596 kB
SReclaimable: 218872 kB
SUnreclaim: 95724 kB
KernelStack: 9456 kB
PageTables: 8544 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 997528 kB
Committed_AS: 5453344 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 18660 kB
VmallocChunk: 0 kB
Percpu: 1688 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 276392 kB
DirectMap2M: 1781760 kB
DirectMap1G: 0 kB
From the GC tuning guide: https://go.dev/doc/gc-guide#Memory_limit
Do take advantage of the memory limit when the execution environment of your Go program is entirely within your control, and the Go program is the only program with access to some set of resources (i.e. some kind of memory reservation, like a container memory limit).
A good example is the deployment of a web service into containers with a fixed amount of available memory.
In this case, a good rule of thumb is to leave an additional 5-10% of headroom to account for memory sources the Go runtime is unaware of.
Ideally we should make the GOMEMLIMIT be 5-10% off from the actual limit, wonder if there is good approach to dynamically set this...
Since the memory for mapping the binary itself is not counted towards Go's calculations, but AFAIK is for container systems, I think that there needs to be at least some allowance for the size of the bits of the binary which will be mapped into memory (so, not debug symbols etc).
I think the biggest issue is that the limits are string specifications with scaling suffices, which makes math harder to do without explicit support. Otherwise, I'd say to look for .Values.nats.resources.limits.memory and only if that is set, then set GOMEMLIMIT based upon a text/template calculation from that; it looks as though Helm loads some math functions into the template evaluation context, see https://helm.sh/docs/chart_template_guide/function_list/.
Do we think it's common for K8S controllers to mutate job specs to set resource limits where they're not already set, so that the values coming from the Helm chart might not be the runtime values, thus the need for using the Downward API?
If not, then perhaps the best approach then is to add support for .Values.nats.gomemlimit as a new string and if that is set, then use it as a value for GOMEMLIMIT (with a sentinel to skip setting it at all?); else if .Values.nats.resources.limits.memory is set, then use that for GOMEMLIMIT; else don't set it, since it will then just be system memory and so there's no point setting it?
Minimally, if we can't do a calculation and want to keep using Downward API to support mutated values, then perhaps support .Values.nats.gomemlimit as an override, so at least if the value is not small enough and containers keep dying, then the humans managing the deployment can override it?
Here is another potential issue, these strings do not necessarily support the same suffixes
GOMEMLIMIT - https://pkg.go.dev/runtime#hdr-Environment_Variables
The supported suffixes include B, KiB, MiB, GiB, and TiB
Kubernetes - https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory
Limits and requests for memory are measured in bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki.
I don't know if the Downward API standardizes the Kubernetes suffixes. But if we go back to the Helm nats.resources.limits.memory variable then we are on the hook for parsing the string into something that GOMEMLIMIT can understand. Not seeing any helm built-ins that do that.
Let's keep it simple and just have an option to set this explicitly via .Values.nats.gomemlimit, there are some libraries a la automaxprocs being released that we might be able to use to let the server set it on the fly in next releases.