logging-operator icon indicating copy to clipboard operation
logging-operator copied to clipboard

Eventrouter - memory leak, fails silently after 3GB

Open jeremych1000 opened this issue 9 months ago • 0 comments

Bugs should be filed for issues encountered whilst operating logging-operator. You should first attempt to resolve your issues through the community support channels, e.g. Slack, in order to rule out individual configuration errors. #logging-operator Please provide as much detail as possible.

Describe the bug: We use eventtailer with logging-operator to log Kubernetes events. The image used is 0.4.0 from https://github.com/kube-logging/eventrouter. I see it's a fork and that #1966 recognises that this isn't great.

We've seen eventrouter linearly consume more and more memory as time goes on, and it failed silently after 3GB of memory consumed. Restarting it fixed the problem.

Expected behaviour: Memory usage of event tailer to be more or less constant.

Steps to reproduce the bug: Monitor event tailer memory usage across a few days. Here's a screenshot.

The dropoff is caused by us merging a change which added requets/limits at 4pm on 19/3. You can still see the memory leak as the graph continues to go up before getting oomkilled.

Image

Additional context: We are running it with vanilla config. Manifest posted below.

Environment details:

  • Kubernetes version (e.g. v1.15.2): v1.30.6
  • Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): on prem
  • logging-operator version (e.g. 2.1.1): 4.10.0, which is not the newest, but the eventrouter image is still 0.4.0
  • Install method (e.g. helm or static manifests): argocd
  • Logs from the misbehaving component (and any other relevant logs): no relevant logs
  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:
apiVersion: logging-extensions.banzaicloud.io/v1alpha1
kind: EventTailer
metadata: <removed for brevity>
spec:
  containerOverrides:
    resources:
      limits:
        cpu: 50m
        memory: 250Mi
      requests:
        cpu: 10m
        memory: 100Mi
  controlNamespace: <redacted>
  image:
    imagePullSecrets: []
    pullPolicy: IfNotPresent
    repository:  <redacted - we use a proxy>
    tag: 0.4.0
  positionVolume:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        volumeMode: Filesystem
  workloadMetaOverrides:
    labels:
      logging-operator/component: eventTailer

/kind bug

jeremych1000 avatar Mar 26 '25 11:03 jeremych1000