camel-k
camel-k copied to clipboard
CPU Spikes on Openshift with unusual operator behaviour
Hi Community,
We are running into an issue when the camel-k operator restarts when we have more than four integrations running on OCP(version mentioned below). Upon restart, the operator keeps on reconciling the integrations continuously which causes CPU spikes on the master node, also resulting in the latency on the kube-api server. The logs for the issue is mentioned below,
VERSION
Camel-k-operator 1.9.1 Camel K Client 1.9.1 OCP 4.9.37Using Kubernetes 1.22
Command to produce the Issue
kamel --kube-config=$QA_KUBECONFIG run $APP(integration file variable) --trait container.enabled=$ENABLED --trait container.request-cpu=$REQUESTCPU --trait container.request-memory=$REQUESTMEMORY --trait container.limit-cpu=$LIMITCPU --trait container.limit-memory=$LIMITMEMORY --trait jvm.options=-Doracle.jdbc.timezoneAsRegion=false --pod-template $PVC2 --config secret:$SECRET --config configmap:$CONFIGMAP -t logging.level=DEBUG
Error Log
{"level":"info","ts":1657633698.0446534,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.0447812,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.3031335,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.3032014,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.5536115,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.5536752,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.9342558,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.9343414,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1299353,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1300168,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
Expected Behavior
We want the operator to be stable upon restart, as restarting has an impact on the platform and other workloads.
Thanks
@christophd Have you encountered this before?
Thanks for reporting the problem. I managed to replicate the issue on a local environment as well. Strangely this happens whenever there are more than a few integrations running (I tried with 5). If you stop the operator pod, as soon as it restarts, it tries reconciling all the running integrations for a few seconds repeatedly. In my case it stops after less than a minute, but it is worth to investigate and to see how to fix.
For us, we noticed when the operator pod restarts the reconciling does not stop, the only fix is we had to delete the running integrations and bring the count to 4 to stop this operator behavior
Is this a CamelK operator issue or an environment(OpenShift) issue?. If this is an Operator issue, can we have any other Camel K version (1.6.0 or 1.6.3 ) which might be stable on the OpenShift environment?
If I remember correctly, @christophd and @astefanutti talked about this recently
If I understand the issue correctly, it is two folds:
1. All the Integration resources are reconciled upon the operator startup:
This is the standard operator behavior, i.e., all the managed resources are reconciled once, so any changes to their state, that could have occurred while the operator was down, are taken into account, so the system can achieve eventual consistency. That indeed may cause a spike w.r.t. compute resources and API server requests. We could look into further tuning the client side QPS and Burst parameters that control API request throttling. These have been increased as part of #2814, but we could make them configurable.
2. The reconciliation goes on indefinitely:
This may be an occurence of the issue fixed by #3285, which has yet to be released in the upcoming 1.9.3 version.
Thanks for the feedback @astefanutti. I've just tested with 1.10-nightly
and I confirm the indefinite reconciliation loop has been fixed:
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.6277504,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it2"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.955836,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it3"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.074853,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it4"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.6067405,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it5"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.981502,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it1"}
I am keeping this open until we do release officially both 1.10 and 1.9.3
I mistakenly put it to 1.11.0. Moving it back to 1.10.0 as it can be closed once we release 1.10.0.