draino icon indicating copy to clipboard operation
draino copied to clipboard

Failed schedule deletion seen in draino logs

Open tarunptala opened this issue 3 years ago • 3 comments

Info

  • Kops Cluster with version 1.16+ running on AWS.
  • planetlabs/draino:e0d5277 image is being used.

We see constant msgs in draino logs

2021-02-19T21:34:50.669Z	ERROR	kubernetes/drainSchedule.go:68	Failed schedule deletion	{"key": "ip-10-53-32-128.us-west-2.compute.internal"}
github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).DeleteSchedule
	/go/src/github.com/planetlabs/draino/internal/kubernetes/drainSchedule.go:68
github.com/planetlabs/draino/internal/kubernetes.(*DrainingResourceEventHandler).OnDelete
	/go/src/github.com/planetlabs/draino/internal/kubernetes/eventhandler.go:152
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnDelete
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:251
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:609
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:284
k8s.io/client-go/tools/cache.(*processorListener).run.func1
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:601
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:599
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71
2021-02-23T11:14:33.146Z	ERROR	kubernetes/drainSchedule.go:68	Failed schedule deletion	{"key": "ip-10-53-31-9.us-west-2.compute.internal"}
github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).DeleteSchedule
	/go/src/github.com/planetlabs/draino/internal/kubernetes/drainSchedule.go:68
github.com/planetlabs/draino/internal/kubernetes.(*DrainingResourceEventHandler).OnDelete
	/go/src/github.com/planetlabs/draino/internal/kubernetes/eventhandler.go:152
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnDelete
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:251
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:609
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:284
k8s.io/client-go/tools/cache.(*processorListener).run.func1
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:601
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:599
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71
2021-02-23T11:15:47.174Z	ERROR	kubernetes/drainSchedule.go:68	Failed schedule deletion	{"key": "ip-10-53-32-225.us-west-2.compute.internal"}

Not sure if i am using correct image of draino.

Note: I have deployed node-problem-detector and cluster-autoscaler alongside with it already.

tarunptala avatar Feb 27 '21 09:02 tarunptala

I just had some luck by changing the RBAC

  • apiGroups: [''] resources: [nodes/status] verbs: [update, patch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels: {component: draino}
  name: draino
rules:
- apiGroups: [apps]
  resources: [statefulsets]
  verbs: [create, update, get, watch, list]
- apiGroups: ['']
  resources: [endpoints]
  verbs: [create, update, get, watch, list]
- apiGroups: ['']
  resources: [events]
  verbs: [create, patch, update]
- apiGroups: ['']
  resources: [nodes]
  verbs: [get, watch, list, update]
- apiGroups: ['']
   resources: [nodes/status]
  verbs: [update, patch]
- apiGroups: ['']
  resources: [pods]
  verbs: [get, watch, list]
- apiGroups: ['']
  resources: [pods/eviction]
  verbs: [create]
- apiGroups: [extensions]
  resources: [daemonsets]
  verbs: [get, watch, list]

jrivers96 avatar Mar 03 '21 21:03 jrivers96

I have the same errors, granting update to nodes/status did not help. This happens for me when the cluster-autoscaler terminates an instance.

kubernetes: v1.20.5 draino: planetlabs/draino:e0d5277 cluster-autoscaler: v1.20.0

pschulten avatar Mar 25 '21 14:03 pschulten

Same error here, will try the suggetion with the RBAC permission!

Daniel-Ebert avatar Jun 22 '21 07:06 Daniel-Ebert