client-go
client-go copied to clipboard
LeaderElection logic stops after a leader failed to update a lock
I imported client-go v0.22.2 and used leader election logic, but I only deployed one pod. When the pod failed to update the lock for whatever reason, the pod does not run OnStartedLeading() callback forever even though that the pod still has 'lease' resource.
My leader election code is below.
func runLeaderElection(lock *resourcelock.LeaseLock, ctx context.Context, id string) {
leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
Lock: lock,
ReleaseOnCancel: true,
LeaseDuration: 10 * time.Second,
RenewDeadline: 5 * time.Second,
RetryPeriod: 2 * time.Second,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: func(c context.Context) {
// do something
},
OnStoppedLeading: func() {
klog.V(3).Info("no longer the leader, staying inactive and stop metering & event logging")
// do something
},
OnNewLeader: func(current_id string) {
if current_id == id {
klog.V(3).Info("still the leader!")
return
}
klog.V(3).Info("new leader is ", current_id)
},
},
})
}
Logs printed like below.
I0825 13:31:24.323896 1 leaderelection.go:248] attempting to acquire leader lease hypercloud5-system/hypercloud5-api-server...
I0825 13:31:24.524691 1 main.go:677] new leader is hypercloud5-api-server-5648fb66bf-vdp8w
I0825 13:31:37.133639 1 leaderelection.go:258] successfully acquired lease hypercloud5-system/hypercloud5-api-server
I0825 13:31:37.133731 1 main.go:674] still the leader!
E0825 13:59:46.770035 1 leaderelection.go:367] Failed to update lock: Put "[https://10.121.0.1:443/apis/coordination.k8s.io/v1/namespaces/hypercloud5-system/leases/hypercloud5-api-server](https://10.121.0.1/apis/coordination.k8s.io/v1/namespaces/hypercloud5-system/leases/hypercloud5-api-server)": context deadline exceeded
I0825 13:59:46.770085 1 leaderelection.go:283] failed to renew lease hypercloud5-system/hypercloud5-api-server: timed out waiting for the condition
E0825 13:59:53.375311 1 leaderelection.go:306] Failed to release lock: Operation cannot be fulfilled on leases.coordination.k8s.io "hypercloud5-api-server": the object has been modified; please apply your changes to the latest version and try again
I0825 13:59:53.375334 1 main.go:669] no longer the leader, staying inactive and stop metering service
You can definitely see that OnStoppedLeading() callback is called. However, there is no new leader election after above logs. Interesting point is that the pod still has lease!
I doubt Update() function in leaderelection.go.
// release attempts to release the leader lease if we have acquired it.
func (le *LeaderElector) release() bool {
if !le.IsLeader() {
return true
}
now := metav1.Now()
leaderElectionRecord := rl.LeaderElectionRecord{
LeaderTransitions: le.observedRecord.LeaderTransitions,
LeaseDurationSeconds: 1,
RenewTime: now,
AcquireTime: now,
}
if err := le.config.Lock.Update(context.TODO(), leaderElectionRecord); err != nil { // <--- HERE
klog.Errorf("Failed to release lock: %v", err)
return false
}
le.setObservedRecord(&leaderElectionRecord)
return true
}
If the leader fails to Update() in release() function, I think it just return false and got stuck.
What is the problem? How can I fix it?