skydive icon indicating copy to clipboard operation
skydive copied to clipboard

Agent and analyzer taking lots of cpu cycles and memory

Open ashishth09 opened this issue 4 years ago • 7 comments

  insight                    netflow-collector-skydive-agent-b26tm                     100m (2%)     2 (51%)      512Mi (3%)       8Gi (61%)      27d
  insight                    netflow-collector-skydive-analyzer-7497cd8b79-5gz8v       100m (2%)     2 (51%)      512Mi (3%)       8Gi (61%)      27d
  insight                    skydive-operator-67c958f454-4rxl6                         0 (0%)        0 (0%)       0 (0%)           0 (0%)         27d

We can see from the above snapshot that the agent and analyzer is taking 2 cores (and has exceeded the limit we put on it) with requests as 100m. What is the reason behind such a huge cpu usage. In fact memory consumption is also very high. 61% of the assigned 8GB limit. We used https://github.com/skydive-project/skydive-operator to install the operator on IBM Cloud classic infra kubernetes service.

Kubernetes version is v1.16.14

ashishth09 avatar Jan 07 '21 11:01 ashishth09

Hi @ashishth09 can you explain more on the environment :: how many hosts + pods ? and what workload is running in that environment ?

Also:: (1) I see 27 days ... is that high cpu new (after long time) or after reboot things are ok ? (2) 100m to the best of my knowledge is 0.1% of a core. can you paste the command that you used to get that output ?

eranra avatar Jan 07 '21 12:01 eranra

@eranra We are getting the above information, will share it once we have it. we used kubectl top cmd.

ajaysikdar avatar Jan 07 '21 16:01 ajaysikdar

@eranra

Skydive was installed on 20+ clusters. it was after installation that it is showing high cpu and reboot didn't work. we had to uninstall it completely.

ajaysikdar avatar Jan 08 '21 13:01 ajaysikdar

@ashishth09

  • How many nodes you had on each cluster and how many pods on each node also how many flows per sec. I am trying to understand the amount of traffic in your systems to understand if this is connected. On typical machines captures are not very extensive but if you configure multiple captures (on many interfaces) and have a lot of traffic it might cause the CPU to go high

There are multiple alternatives, some are to reduce the number of in parallel captures , some are to move to more efficient captures like ebpf ( http://skydive.network/blog/skydive-with-ebpf.html) but before going there, maybe you should check things like that 100m is that really 2 cpus that skydive consumes or much lower than that?

eranra avatar Jan 08 '21 14:01 eranra

@eranra ,

Do you know, how we can verify 100m is that really 2 cpus that skydive consumes or much lower than that?

ajaysikdar avatar Jan 11 '21 05:01 ajaysikdar

@ashishth09 easiest way, if you can connect to the k8s node (the VM) is to execute something like ps aux --sort=-pcpu | head -n 10 this will show top 10 cpu consuming processes on the node ... make sure you connect to the correct node running skydive analyzer and agents. Another option is to increase the limit in the deployment and deamon-set and see if skydive works better. For example change to 1000m (== 1 virtual CPU) ... wait for new pods to be deployed and check that the limit really changed

eranra avatar Jan 11 '21 07:01 eranra

@ashishth09 is this still relevant ???? if not please close

eranra avatar Feb 01 '21 11:02 eranra