popeye icon indicating copy to clipboard operation
popeye copied to clipboard

Increased memory useage with latest version

Open s4mur4i opened this issue 1 year ago • 5 comments




Describe the bug We upgrade our popeye from version 0.11.1 to 0.21.3 Previously popeye was using limits of 100 CPU and 100-200 MB Memory. with latest version it needs 500 CPU and under 1 GB of memory it gets oom killed. on some clusters it even requires 4.5GB of memory. is this something expected, or why did popeye suddenly start using so much memory.

To Reproduce Steps to reproduce the behavior:

  1. Upgraded popeye to latest version

Expected behavior I would be okay with some increase, but 3-4 GB of memory requirement seems to much, when previous one used around 100-200 MB

Screenshots Grafana output of one run Screenshot 2024-09-04 at 11 37 00

Versions (please complete the following information):

  • Popeye v0.21.3
  • K8s 1.29.7-eks

s4mur4i avatar Sep 04 '24 08:09 s4mur4i

@s4mur4i Thanks for reporting this! How big is your cluster? nodes, pods, etc... Also how are you running popeye ie wide open or using filters?

derailed avatar Sep 14 '24 17:09 derailed

Hello, Sorry, I was on holiday and could not respond for some time. Our clusters usually have around 10-20 nodes, and some might go up to 30 nodes. With pods, I would say 300-600 pods. We have different products.. each product has its cluster for the dev/prod environment. We use following arguments:

-A -f  /spinach/spinach.yaml --out  html -l  info --s3-bucket  xyz --push-gtwy-url http://pushgateway-service:9091 --cluster xyz --kubeconfig /etc/kubeconfig/config.yaml --force-exit-zero=true
                ```
We tested with separating pushgateway and bucket upload into 2 separate pods as it was done previously, but it didn't lower memory useage.

s4mur4i avatar Sep 30 '24 06:09 s4mur4i

@s4mur4i Thank you for the details! Popeye now uses an in memory database that is loaded and kept around until the process finishes. For larger clusters this could cause the mem footprint to be larger now vs prior releases. I'll take a peek to see if we can trim things a bit more. In the meantime could you share your spinach file? Also it will help if you target specific namespaces vs all namespaces since the in mem corpus will be much smaller.

derailed avatar Dec 30 '24 19:12 derailed

Hello @derailed Thanks for the information, can we get some metrics or details what is loaded into the memory database? I can try running some special builds to understand more deeper who it performs, or which part is growing to big. generally when we deploy even 1-2 service further to clsuter memory footprint can increase with 2-300 MB. Our spinach config is quite simple:

    popeye:
      excludes:
        linters:
          clusterroles:
            codes:
              - '400'
          configmaps:
            codes:
              - '400'
              - '401'
          daemonsets:
            instances:
              - fqns: ['rx:kube-system/kube-proxy']
                codes:
                  - 107
            codes:
              - '108'
              - '505'
          deployments:
            codes:
              - '108'
              - '505'
          horizontalpodautoscalers:
            codes:
              - '602'
              - '603'
              - '604'
              - '605'
          namespaces:
            instances:
              - fqns: ['default', 'kube-node-lease', 'kube-public']
                codes:
                  - 400
          nodes:
            codes:
              - '700'
              - '709'
              - '710'
          pods:
            instances:
              - fqns:
                  [
                    'rx:kube-system/node-tagger*',
                    'rx:kube-system/kube-proxy*',
                    'rx:kube-system/aws-node*',
                    'rx:kube-system/ebs-csi*',
                    'rx:cloud/github-actions*',
                  ]
                codes:
                  - 102
              - fqns: ['rx:kube-system/ebs-csi*']
                codes:
                  - 104
              - fqns: ['rx:cronjob']
                codes:
                  - 206
              - fqns:
                  [
                    'rx:xyz',
                    'rx:xyz',
                    'rx:xyz',
                    'cloud/xyz',
                  ]
                codes:
                  - 206
            codes:
              - '105'
              - '108'
              - '109'
              - '110'
              - '111'
              - '112'
              - '203'
              - '204'
              - '205'
              - '207'
              - '300'
              - '301'
              - '302'
              - '306'
              - '1204'
          secrets:
            codes:
              - '400'
              - '401'
          services:
            codes:
              - '1101'
              - '1102'
              - '1103'
              - '1104'
              - '1109'
          persistentvolumeclaims:
            instances:
              - fqns: ['cloud/xyz']
                codes:
                  - 400
          serviceaccounts:
            instances:
              - fqns: ['default/default', 'kube-node-lease/default', 'kube-public/default']
                codes:
                  - 400
              - fqns:
                  ['kube-system/ebs-csi-controller-sa', 'kube-system/ebs-csi-controller-sa', 'kube-system/ebs-csi-node-sa']
                codes:
                  - 303
          statefulsets:
            codes:
              - '108'
              - '503'
          ingresses:
            codes:
              - '1403'
          cronjobs:
            codes:
              - '1501'
              - '1502'
              - '1503'
          jobs:
            codes:
              - '1503'
          clusterrolebindings:
            instances:
              - fqns: ['system:controller:route-controller', 'system:kube-dns']
                codes:
                  - 1300
                  ```
I removed some of our internal application names and replazed with xyz, i dont believe they should be relevant to case.
Our namespaces are quite simple:

cloud default kube-node-lease kube-public kube-system

other clusters have 1 extra namespace, and one of them has 2 extra ones.. and kube-public, default, kube-node-lease is not utilized by us, and kube-system is only used with default services.. so all our services live in the cloud namespace, or another organization specific one.

s4mur4i avatar Dec 31 '24 11:12 s4mur4i

+1

sunilhonest avatar Aug 08 '25 04:08 sunilhonest