camel-k-operator pod restarts randomly

Open mattttv opened this issue 3 weeks ago • 1 comments

Bug description

Hi guys we have a camel-k-operator installed inside a namespace on openshift 4.19 running with the serviceaccount camel-k-operator. the operator pod (we just use one but leader election is still active I suppose) keeps restarting for the last couple of days but only at random times. I suspect that there is no actual problems with the integrations but I would like to avoid restarting the pod if not necessary.

when I look at the previous pod logs I see the following logs: [1] and [2] - here is the breakdown of these mesages

cannot get lock from api for 5 seconds by requesting the lease object
restart pod because leader election lost

the cluster does not have any problems and therefore I doubt that the api actually takes 5 seconds to answer. nevertheless I would like to increase the wait-limits. I have not seen anything in the documentation - can you guys give me a hint?

of course the lease exists:

oc get leases.coordination.k8s.io/camel-k-lock -o yaml

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2025-07-21T10:25:30Z"
  name: camel-k-lock
spec:
  acquireTime: "2025-12-03T15:14:58.120490Z"
  holderIdentity: camel-k-operator-6f5bcdbf6f-f7grs_8833ceeb-f2e4-491b-85d6-f63d74c090b9
  leaseDurationSeconds: 15
  leaseTransitions: 14
  renewTime: "2025-12-03T17:10:58.969750Z"

Thank you very much, Matt

[1] {"level":"error","ts":"2025-12-03T15:14:47Z","logger":"camel-k.cmd","msg":"error retrieving resource lock dev/camel-k-lock: Get \"https://172.29.0.1:443/apis/coordination.k8s.io/v1/namespaces/dev/leases/camel-k-lock?timeout=5s\": context deadline exceeded","stacktrace":"k8s.io/klog/v2.(*loggingT).output\n\tk8s.io/klog/[email protected]/klog.go:882\nk8s.io/klog/v2.(*loggingT).printfDepth\n\tk8s.io/klog/[email protected]/klog.go:760\nk8s.io/klog/v2.(*loggingT).printf\n\tk8s.io/klog/[email protected]/klog.go:737\nk8s.io/klog/v2.Errorf\n\tk8s.io/klog/[email protected]/klog.go:1597\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).tryAcquireOrRenew\n\tk8s.io/[email protected]/tools/leaderelection/leaderelection.go:436\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).renew.func1.1\n\tk8s.io/[email protected]/tools/leaderelection/leaderelection.go:285\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1\n\tk8s.io/[email protected]/pkg/util/wait/loop.go:53\nk8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext\n\tk8s.io/[email protected]/pkg/util/wait/loop.go:54\nk8s.io/apimachinery/pkg/util/wait.PollUntilContextTimeout\n\tk8s.io/[email protected]/pkg/util/wait/poll.go:48\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).renew.func1\n\tk8s.io/[email protected]/tools/leaderelection/leaderelection.go:283\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tk8s.io/[email protected]/pkg/util/wait/backoff.go:233\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext.func1\n\tk8s.io/[email protected]/pkg/util/wait/backoff.go:255\nk8s.io/apimachinery/pkg/util/wait.BackoffUntilWithContext\n\tk8s.io/[email protected]/pkg/util/wait/backoff.go:256\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/[email protected]/pkg/util/wait/backoff.go:233\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tk8s.io/[email protected]/pkg/util/wait/backoff.go:210\nk8s.io/apimachinery/pkg/util/wait.Until\n\tk8s.io/[email protected]/pkg/util/wait/backoff.go:163\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).renew\n\tk8s.io/[email protected]/tools/leaderelection/leaderelection.go:282\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).Run\n\tk8s.io/[email protected]/tools/leaderelection/leaderelection.go:221\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).Start.func3\n\tsigs.k8s.io/[email protected]/pkg/manager/internal.go:449"}

[2] {"level":"info","ts":"2025-12-03T15:14:47Z","logger":"camel-k.cmd","msg":"failed to renew lease dev/camel-k-lock: context deadline exceeded"} {"level":"error","ts":"2025-12-03T15:14:47Z","logger":"camel-k.cmd","msg":"manager exited non-zero","error":"leader election lost","stacktrace":"github.com/apache/camel-k/v2/pkg/util/log.Logger.Error\n\tgithub.com/apache/camel-k/v2/pkg/util/log/log.go:80\ngithub.com/apache/camel-k/v2/pkg/cmd/operator.exitOnError\n\tgithub.com/apache/camel-k/v2/pkg/cmd/operator/operator.go:280\ngithub.com/apache/camel-k/v2/pkg/cmd/operator.Run\n\tgithub.com/apache/camel-k/v2/pkg/cmd/operator/operator.go:236\ngithub.com/apache/camel-k/v2/pkg/cmd.(*operatorCmdOptions).run\n\tgithub.com/apache/camel-k/v2/pkg/cmd/operator.go:71\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:1019\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1148\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:1071\nmain.main\n\t./main.go:43\nruntime.main\n\truntime/proc.go:285"}

Camel K or runtime version

v2.8.0

Dec 03 '25 17:12 mattttv

This one does not seem to be a Camel K problem. The issue is a timeout when pinging the API server. The problem may be a sizing or a overload of it. We don't have any parameter to control the timeout. This is the part of the code where you can have a look:

https://github.com/apache/camel-k/blob/10e66c23267f2eb24ae2c79723f985bc10cbc898/pkg/cmd/operator/operator.go#L204-L212

You may try to introduce an environment variable, when it's available, then, the leader election timeout is set to such a value. Feel free to propose a PR for that.

Dec 04 '25 07:12 squakez