skydive
skydive copied to clipboard
Agent and analyzer taking lots of cpu cycles and memory
insight netflow-collector-skydive-agent-b26tm 100m (2%) 2 (51%) 512Mi (3%) 8Gi (61%) 27d
insight netflow-collector-skydive-analyzer-7497cd8b79-5gz8v 100m (2%) 2 (51%) 512Mi (3%) 8Gi (61%) 27d
insight skydive-operator-67c958f454-4rxl6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 27d
We can see from the above snapshot that the agent and analyzer is taking 2 cores (and has exceeded the limit we put on it) with requests as 100m. What is the reason behind such a huge cpu usage. In fact memory consumption is also very high. 61% of the assigned 8GB limit. We used https://github.com/skydive-project/skydive-operator to install the operator on IBM Cloud classic infra kubernetes service.
Kubernetes version is v1.16.14
Hi @ashishth09 can you explain more on the environment :: how many hosts + pods ? and what workload is running in that environment ?
Also:: (1) I see 27 days ... is that high cpu new (after long time) or after reboot things are ok ? (2) 100m to the best of my knowledge is 0.1% of a core. can you paste the command that you used to get that output ?
@eranra
We are getting the above information, will share it once we have it.
we used kubectl top
cmd.
@eranra
Skydive was installed on 20+ clusters. it was after installation that it is showing high cpu and reboot didn't work. we had to uninstall it completely.
@ashishth09
- How many nodes you had on each cluster and how many pods on each node also how many flows per sec. I am trying to understand the amount of traffic in your systems to understand if this is connected. On typical machines captures are not very extensive but if you configure multiple captures (on many interfaces) and have a lot of traffic it might cause the CPU to go high
There are multiple alternatives, some are to reduce the number of in parallel captures , some are to move to more efficient captures like ebpf ( http://skydive.network/blog/skydive-with-ebpf.html) but before going there, maybe you should check things like that 100m is that really 2 cpus that skydive consumes or much lower than that?
@eranra ,
Do you know, how we can verify 100m is that really 2 cpus that skydive consumes or much lower than that?
@ashishth09 easiest way, if you can connect to the k8s node (the VM) is to execute something like ps aux --sort=-pcpu | head -n 10
this will show top 10 cpu consuming processes on the node ... make sure you connect to the correct node running skydive analyzer and agents. Another option is to increase the limit in the deployment and deamon-set and see if skydive works better. For example change to 1000m (== 1 virtual CPU) ... wait for new pods to be deployed and check that the limit really changed
@ashishth09 is this still relevant ???? if not please close