shell-operator Gradual Increase in Memory Consumption

Hi,

Since 1.4.8 we have observed our shell-operator pods slowly consuming memory over time:

I made a local branch with pprof installed and it appears to be logrus that is not releasing its memory:

Environment:

1.48 - 1.4.11 (we've tested each version on release)
Kubernetes version: AKS 1.29.4
Installation type Helm

Worth noting that 1.4.7 behaves as expected on the same cluster.

Anything else we should know?: I find it odd that nobody else is reporting this issue - I can only assume it's some oddity in our environment but I'm pretty much out of ideas.

From what I can see the version of the logrus package hasn't changed between versions of this application (particularly 1.47 - 1.48). If you have any ideas of how we could debug further that would be appreciated.

I've attached the heap dump if that's of any help

Thanks

heap.zip

Sep 11 '24 07:09 J0ram

Hit by this issue. Tryed to set GOMEMLIMIT with no luck (then checked Go version = 1.19 which does not support a soft memory limit).

Shell Operator: 1.4.12 K8s: 1.30.3 Linux Kernel: 6.6.52 with THP enabled in madvise mode (it is relevant for Go > 1.20 I think)

Reproducer project: https://github.com/cit-consulting/hetzner-failoverip-controller

Sep 27 '24 08:09 vladimirfx

Also hitting this.

Shell Operator: 1.4.10 K8s: 1.29.8

Oct 02 '24 09:10 sidineyc

Same here with multiple operators running on different clusters using 1.4.10. Pod crashes and restarts when it hits memory limit. Screenshot 2024-10-18 at 14 51 25