machine-controller-manager icon indicating copy to clipboard operation
machine-controller-manager copied to clipboard

Reduce CPU utilization

Open amshuman-kr opened this issue 4 years ago • 11 comments

Issue

During performance tests, of all the control-plane components, the CPU utilisation of MCM was comparable to kube-apiserver and even more than etcd. This is surprising as MCM is handling at least two orders of magnitude less number of resources when compared to the other control-plane components.

Reducing the CPU utilisation will help improve the scalability of MCM as well as gardener.

Solution

Profile and optimize the CPU utilization of MCM.

amshuman-kr avatar Aug 07 '19 12:08 amshuman-kr

I think most of the CPU utilization would be due to the reconcilation of machine objects. They get reconciled on every node object update. And a node object is updated every 10-30s due to update on condition checks like - memory, disk, network, readiness. I think we could reduce this reconciliation intervel either at the MCM or on the node level. We can take a call while fixing this issue.

prashanth26 avatar Aug 08 '19 05:08 prashanth26

I think we should also look at the number of goroutines and retry logic. But let's profile first.

amshuman-kr avatar Nov 22 '19 06:11 amshuman-kr

Hi, we see a constantly increasing CPU usage of mcm in our environment with only 3 nodes in the cluster. pproof shows a vast amount of parked threads:

runtime.gopark
/usr/local/go/src/runtime/proc.go

  Total:       29300      29300 (flat, cum)   100%
runtime.selectgo
/usr/local/go/src/runtime/select.go

  Total:           2      29113 (flat, cum) 99.35%

But i must admit that i have no idea how this could happen tbh. Ideas ?

majst01 avatar Dec 02 '19 08:12 majst01

@majst01 #341 addressed the constant increase in CPU usage. Does it not work for you?

We have kept the current issue open to track the optimisation of the baseline CPU usage.

amshuman-kr avatar Dec 02 '19 08:12 amshuman-kr

We already have #341, i will try the most recent version as well and report back.

majst01 avatar Dec 02 '19 09:12 majst01

We already have #341

Thanks. We saw improvement with #341. But this is good information for us. We will also check from our end.

amshuman-kr avatar Dec 02 '19 09:12 amshuman-kr

I doubt that the changes upstream since #341 change any behavior here. I will also have a look for unclosed channels et.al.

majst01 avatar Dec 02 '19 09:12 majst01

I doubt that the changes upstream since #341 change any behavior here.

Yes. If you already have #341, there are no further relevant changes that might help.

amshuman-kr avatar Dec 02 '19 09:12 amshuman-kr

I tried to run https://github.com/golangci/golangci-lint on the code base but failed actually with:

WARN [runner] Can't run linter goanalysis_metalinter: assign: failed prerequisites: [email protected]/gardener/machine-controller-manager/pkg/client/clientset/internalversion/typed/machine/internalversion 
WARN [runner] Can't run linter unused: buildssa: analysis skipped: errors in package: [/home/stefan/dev/devops/cloud-native/metal/metal-pod/machine-controller-manager/pkg/client/clientset/internalversion/typed/machine/internalversion/machine_client.go:10:6: MachineInterface redeclared in this block /home/stefan/dev/devops/cloud-native/metal/metal-pod/machine-controller-manager/pkg/client/clientset/internalversion/typed/machine/internalversion/machine.go:21:6:       other declaration of MachineInterface /home/stefan/dev/devops/cloud-native/metal/metal-pod/machine-controller-manager/pkg/client/clientset/internalversion/typed/machine/internalversion/machine.go:112:20: machine.Machine undefined (type *machine.Machine has no field or method Machine) /home/stefan/dev/devops/cloud-native/metal/metal-pod/machine-controller-manager/pkg/client/clientset/internalversion/typed/machine/internalversion/machine.go:97:20: machine.Machine undefined (type *machine.Machine has no field or method Machine) /home/stefan/dev/devops/cloud-native/metal/metal-pod/machine-controller-manager/pkg/client/clientset/internalversion/typed/machine/internalversion/machine.go:85:20: machine.Machine undefined (type *machine.Machine has no field or method Machine)]

We lint all of our code in CI to prevent obvious bugs, but this kind of problem never occurred.

majst01 avatar Dec 02 '19 09:12 majst01

We do link check here: https://github.com/gardener/machine-controller-manager/blob/master/.ci/check#L61 , in case it helps.

Also, what version of MCM were you rebasing/using?

hardikdr avatar Dec 02 '19 09:12 hardikdr

We are using master

majst01 avatar Dec 02 '19 09:12 majst01