Gradual Increase in Memory Consumption
Hi,
Since 1.4.8 we have observed our shell-operator pods slowly consuming memory over time:
I made a local branch with pprof installed and it appears to be logrus that is not releasing its memory:
Environment:
- 1.48 - 1.4.11 (we've tested each version on release)
- Kubernetes version: AKS 1.29.4
- Installation type Helm
Worth noting that 1.4.7 behaves as expected on the same cluster.
Anything else we should know?: I find it odd that nobody else is reporting this issue - I can only assume it's some oddity in our environment but I'm pretty much out of ideas.
From what I can see the version of the logrus package hasn't changed between versions of this application (particularly 1.47 - 1.48). If you have any ideas of how we could debug further that would be appreciated.
I've attached the heap dump if that's of any help
Thanks
Hit by this issue. Tryed to set GOMEMLIMIT with no luck (then checked Go version = 1.19 which does not support a soft memory limit).
Shell Operator: 1.4.12
K8s: 1.30.3
Linux Kernel: 6.6.52 with THP enabled in madvise mode (it is relevant for Go > 1.20 I think)
Reproducer project: https://github.com/cit-consulting/hetzner-failoverip-controller
Also hitting this.
Shell Operator: 1.4.10 K8s: 1.29.8
Same here with multiple operators running on different clusters using 1.4.10. Pod crashes and restarts when it hits memory limit.
Checked 1.4.14 - classic memory leak:
Because of Go 1.22 and GOLIMIT, the operator uses a lot of CPU on GC before being killed by Kubelet.
Hello. Thank you for the report. We also met the logrus leak a few time ago. we're currently working on changing the logger.
We have a quick fix in v1.4.15. Could you try it, please.
We have a quick fix in v1.4.15. Could you try it, please.
The memory profile looks better but I hit by log duplication https://github.com/flant/shell-operator/issues/675
Keeps monitoring.
We've been running 1.4.15 overnight - memory usage is completely flat :)
Thanks all