k8s icon indicating copy to clipboard operation
k8s copied to clipboard

Add mem soft limit for Go 1.19 support

Open wallyqs opened this issue 3 years ago • 3 comments

This makes Go GC use the memory limit that it is allocatable for the container instead the one set for the host. In case there is no resource limit set, then it would use by default what is allocatable according to the K8S node.

wallyqs avatar Aug 02 '22 23:08 wallyqs

Result of running without resource limits set (t3.small 2.0GB mem)

# kubectl get nodes -o yaml
    allocatable:
      cpu: 1930m
      ephemeral-storage: "76224326324"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 1507636Ki
# kubectl exec -it nats-mem-0 -- /bin/sh
$ free -m
               total        used        free      shared  buff/cache   available
Mem:            1948         594         140           1        1213        1234
Swap:              0           0           0

$ env    
GOMEMLIMIT=1543819264

$ cat /proc/meminfo 
MemTotal:        1995060 kB
MemFree:          144400 kB
MemAvailable:    1264464 kB
Buffers:            2700 kB
Cached:          1020520 kB
SwapCached:            0 kB
Active:           940764 kB
Inactive:         517396 kB
Active(anon):     368708 kB
Inactive(anon):      424 kB
Active(file):     572056 kB
Inactive(file):   516972 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               140 kB
Writeback:             0 kB
AnonPages:        434964 kB
Mapped:           185796 kB
Shmem:              1776 kB
KReclaimable:     218872 kB
Slab:             314596 kB
SReclaimable:     218872 kB
SUnreclaim:        95724 kB
KernelStack:        9456 kB
PageTables:         8544 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      997528 kB
Committed_AS:    5453344 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       18660 kB
VmallocChunk:          0 kB
Percpu:             1688 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      276392 kB
DirectMap2M:     1781760 kB
DirectMap1G:           0 kB

wallyqs avatar Aug 03 '22 00:08 wallyqs

From the GC tuning guide: https://go.dev/doc/gc-guide#Memory_limit

Do take advantage of the memory limit when the execution environment of your Go program is entirely within your control, and the Go program is the only program with access to some set of resources (i.e. some kind of memory reservation, like a container memory limit).

A good example is the deployment of a web service into containers with a fixed amount of available memory.

In this case, a good rule of thumb is to leave an additional 5-10% of headroom to account for memory sources the Go runtime is unaware of.

Ideally we should make the GOMEMLIMIT be 5-10% off from the actual limit, wonder if there is good approach to dynamically set this...

wallyqs avatar Aug 03 '22 01:08 wallyqs

Since the memory for mapping the binary itself is not counted towards Go's calculations, but AFAIK is for container systems, I think that there needs to be at least some allowance for the size of the bits of the binary which will be mapped into memory (so, not debug symbols etc).

I think the biggest issue is that the limits are string specifications with scaling suffices, which makes math harder to do without explicit support. Otherwise, I'd say to look for .Values.nats.resources.limits.memory and only if that is set, then set GOMEMLIMIT based upon a text/template calculation from that; it looks as though Helm loads some math functions into the template evaluation context, see https://helm.sh/docs/chart_template_guide/function_list/.

Do we think it's common for K8S controllers to mutate job specs to set resource limits where they're not already set, so that the values coming from the Helm chart might not be the runtime values, thus the need for using the Downward API?

If not, then perhaps the best approach then is to add support for .Values.nats.gomemlimit as a new string and if that is set, then use it as a value for GOMEMLIMIT (with a sentinel to skip setting it at all?); else if .Values.nats.resources.limits.memory is set, then use that for GOMEMLIMIT; else don't set it, since it will then just be system memory and so there's no point setting it?

Minimally, if we can't do a calculation and want to keep using Downward API to support mutated values, then perhaps support .Values.nats.gomemlimit as an override, so at least if the value is not small enough and containers keep dying, then the humans managing the deployment can override it?

philpennock avatar Aug 03 '22 15:08 philpennock

Here is another potential issue, these strings do not necessarily support the same suffixes

GOMEMLIMIT - https://pkg.go.dev/runtime#hdr-Environment_Variables

The supported suffixes include B, KiB, MiB, GiB, and TiB

Kubernetes - https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory

Limits and requests for memory are measured in bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki.

I don't know if the Downward API standardizes the Kubernetes suffixes. But if we go back to the Helm nats.resources.limits.memory variable then we are on the hook for parsing the string into something that GOMEMLIMIT can understand. Not seeing any helm built-ins that do that.

caleblloyd avatar Aug 25 '22 14:08 caleblloyd

Let's keep it simple and just have an option to set this explicitly via .Values.nats.gomemlimit, there are some libraries a la automaxprocs being released that we might be able to use to let the server set it on the fly in next releases.

wallyqs avatar Sep 09 '22 17:09 wallyqs