skydive Agent and analyzer taking lots of cpu cycles and memory

  insight                    netflow-collector-skydive-agent-b26tm                     100m (2%)     2 (51%)      512Mi (3%)       8Gi (61%)      27d
  insight                    netflow-collector-skydive-analyzer-7497cd8b79-5gz8v       100m (2%)     2 (51%)      512Mi (3%)       8Gi (61%)      27d
  insight                    skydive-operator-67c958f454-4rxl6                         0 (0%)        0 (0%)       0 (0%)           0 (0%)         27d

We can see from the above snapshot that the agent and analyzer is taking 2 cores (and has exceeded the limit we put on it) with requests as 100m. What is the reason behind such a huge cpu usage. In fact memory consumption is also very high. 61% of the assigned 8GB limit. We used https://github.com/skydive-project/skydive-operator to install the operator on IBM Cloud classic infra kubernetes service.

Kubernetes version is v1.16.14

Jan 07 '21 11:01 ashishth09

Hi @ashishth09 can you explain more on the environment :: how many hosts + pods ? and what workload is running in that environment ?

Also:: (1) I see 27 days ... is that high cpu new (after long time) or after reboot things are ok ? (2) 100m to the best of my knowledge is 0.1% of a core. can you paste the command that you used to get that output ?

Jan 07 '21 12:01 eranra

@eranra We are getting the above information, will share it once we have it. we used kubectl top cmd.

Jan 07 '21 16:01 ajaysikdar

@eranra

Skydive was installed on 20+ clusters. it was after installation that it is showing high cpu and reboot didn't work. we had to uninstall it completely.

Jan 08 '21 13:01 ajaysikdar

@ashishth09

How many nodes you had on each cluster and how many pods on each node also how many flows per sec. I am trying to understand the amount of traffic in your systems to understand if this is connected. On typical machines captures are not very extensive but if you configure multiple captures (on many interfaces) and have a lot of traffic it might cause the CPU to go high

There are multiple alternatives, some are to reduce the number of in parallel captures , some are to move to more efficient captures like ebpf ( http://skydive.network/blog/skydive-with-ebpf.html) but before going there, maybe you should check things like that 100m is that really 2 cpus that skydive consumes or much lower than that?

Jan 08 '21 14:01 eranra

@eranra ,

Do you know, how we can verify 100m is that really 2 cpus that skydive consumes or much lower than that?

Jan 11 '21 05:01 ajaysikdar

@ashishth09 easiest way, if you can connect to the k8s node (the VM) is to execute something like ps aux --sort=-pcpu | head -n 10 this will show top 10 cpu consuming processes on the node ... make sure you connect to the correct node running skydive analyzer and agents. Another option is to increase the limit in the deployment and deamon-set and see if skydive works better. For example change to 1000m (== 1 virtual CPU) ... wait for new pods to be deployed and check that the limit really changed

Jan 11 '21 07:01 eranra

@ashishth09 is this still relevant ???? if not please close

Feb 01 '21 11:02 eranra

skydive skydive copied to clipboard

Agent and analyzer taking lots of cpu cycles and memory

skydive
skydive copied to clipboard