camel-k icon indicating copy to clipboard operation
camel-k copied to clipboard

CPU Spikes on Openshift with unusual operator behaviour

Open Mohid-A opened this issue 2 years ago • 7 comments

Hi Community,

We are running into an issue when the camel-k operator restarts when we have more than four integrations running on OCP(version mentioned below). Upon restart, the operator keeps on reconciling the integrations continuously which causes CPU spikes on the master node, also resulting in the latency on the kube-api server. The logs for the issue is mentioned below,

VERSION

Camel-k-operator 1.9.1 Camel K Client 1.9.1 OCP 4.9.37Using Kubernetes 1.22

Command to produce the Issue

kamel --kube-config=$QA_KUBECONFIG run $APP(integration file variable) --trait container.enabled=$ENABLED --trait container.request-cpu=$REQUESTCPU --trait container.request-memory=$REQUESTMEMORY --trait container.limit-cpu=$LIMITCPU --trait container.limit-memory=$LIMITMEMORY --trait jvm.options=-Doracle.jdbc.timezoneAsRegion=false --pod-template $PVC2 --config secret:$SECRET --config configmap:$CONFIGMAP -t logging.level=DEBUG

Error Log

{"level":"info","ts":1657633698.0446534,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.0447812,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.3031335,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.3032014,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.5536115,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.5536752,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.9342558,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.9343414,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1299353,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1300168,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}

Expected Behavior

We want the operator to be stable upon restart, as restarting has an impact on the platform and other workloads.

Thanks

Mohid-A avatar Jul 12 '22 13:07 Mohid-A

@christophd Have you encountered this before?

heiko-braun avatar Jul 12 '22 14:07 heiko-braun

Thanks for reporting the problem. I managed to replicate the issue on a local environment as well. Strangely this happens whenever there are more than a few integrations running (I tried with 5). If you stop the operator pod, as soon as it restarts, it tries reconciling all the running integrations for a few seconds repeatedly. In my case it stops after less than a minute, but it is worth to investigate and to see how to fix.

squakez avatar Jul 12 '22 14:07 squakez

For us, we noticed when the operator pod restarts the reconciling does not stop, the only fix is we had to delete the running integrations and bring the count to 4 to stop this operator behavior

Mohid-A avatar Jul 12 '22 15:07 Mohid-A

Is this a CamelK operator issue or an environment(OpenShift) issue?. If this is an Operator issue, can we have any other Camel K version (1.6.0 or 1.6.3 ) which might be stable on the OpenShift environment?

gtata007 avatar Jul 12 '22 17:07 gtata007

If I remember correctly, @christophd and @astefanutti talked about this recently

heiko-braun avatar Jul 12 '22 17:07 heiko-braun

If I understand the issue correctly, it is two folds:

1. All the Integration resources are reconciled upon the operator startup:

This is the standard operator behavior, i.e., all the managed resources are reconciled once, so any changes to their state, that could have occurred while the operator was down, are taken into account, so the system can achieve eventual consistency. That indeed may cause a spike w.r.t. compute resources and API server requests. We could look into further tuning the client side QPS and Burst parameters that control API request throttling. These have been increased as part of #2814, but we could make them configurable.

2. The reconciliation goes on indefinitely:

This may be an occurence of the issue fixed by #3285, which has yet to be released in the upcoming 1.9.3 version.

astefanutti avatar Jul 13 '22 08:07 astefanutti

Thanks for the feedback @astefanutti. I've just tested with 1.10-nightly and I confirm the indefinite reconciliation loop has been fixed:

camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.6277504,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it2"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.955836,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it3"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.074853,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it4"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.6067405,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it5"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.981502,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it1"}

I am keeping this open until we do release officially both 1.10 and 1.9.3

squakez avatar Jul 13 '22 08:07 squakez

I mistakenly put it to 1.11.0. Moving it back to 1.10.0 as it can be closed once we release 1.10.0.

tadayosi avatar Aug 25 '22 04:08 tadayosi