kubespray
kubespray copied to clipboard
kube-scheduler doesn't work properly after reboot
What happened?
I successfully installed a Kubernetes cluster on my RHEL servers. However, kube-scheduler does not work properly after rebooting the master node. It doesn't clean up completed jobs and terminated pods. I installed two different clusters, and both of them have the same issue. kube-scheduler logs show that it cannot access some resources, but system:kube-scheduler looks good to me, though.
kube-scheduler logs
I0426 12:28:37.152855 1 serving.go:348] Generated self-signed cert in-memory
W0426 12:28:39.213496 1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
W0426 12:28:39.213573 1 authentication.go:368] Error looking up in-cluster authentication configuration: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
W0426 12:28:39.213634 1 authentication.go:369] Continuing without authentication configuration. This may treat all requests as anonymous.
W0426 12:28:39.213663 1 authentication.go:370] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false
I0426 12:28:39.243027 1 server.go:154] "Starting Kubernetes Scheduler" version="v1.28.6"
I0426 12:28:39.243268 1 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0426 12:28:39.245677 1 secure_serving.go:213] Serving securely on [::]:10259
I0426 12:28:39.245853 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0426 12:28:39.245915 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0426 12:28:39.245972 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0426 12:28:39.250049 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope
W0426 12:28:39.250066 1 reflector.go:535] pkg/server/dynamiccertificates/configmap_cafile_content.go:206: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E0426 12:28:39.250101 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope
E0426 12:28:39.250133 1 reflector.go:147] pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
W0426 12:28:39.253131 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope
W0426 12:28:39.253450 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.253593 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSINode: failed to list *v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
W0426 12:28:39.253458 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope
E0426 12:28:39.253754 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.ReplicaSet: failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope
W0426 12:28:39.253273 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.253807 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
W0426 12:28:39.253292 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope
E0426 12:28:39.253848 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope
W0426 12:28:39.253307 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0426 12:28:39.253890 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0426 12:28:39.253370 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope
E0426 12:28:39.253928 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope
W0426 12:28:39.253386 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.254033 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIStorageCapacity: failed to list *v1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
W0426 12:28:39.253398 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Namespace: namespaces is forbidden: User "system:kube-scheduler" cannot list resource "namespaces" in API group "" at the cluster scope
E0426 12:28:39.254071 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:kube-scheduler" cannot list resource "namespaces" in API group "" at the cluster scope
W0426 12:28:39.253458 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope
E0426 12:28:39.254167 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.ReplicationController: failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope
E0426 12:28:39.253473 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.PersistentVolume: failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope
W0426 12:28:39.253160 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope
E0426 12:28:39.254268 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope
W0426 12:28:39.253499 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope
E0426 12:28:39.254315 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.PersistentVolumeClaim: failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope
W0426 12:28:39.254839 1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.254868 1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
system:kube-scheduler
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:kube-scheduler
uid: 8a2cce65-9058-48a3-b12f-29bab10f403d
resourceVersion: '103'
creationTimestamp: '2024-04-04T11:49:51Z'
labels:
kubernetes.io/bootstrapping: rbac-defaults
annotations:
rbac.authorization.kubernetes.io/autoupdate: 'true'
managedFields:
- manager: kube-apiserver
operation: Update
apiVersion: rbac.authorization.k8s.io/v1
time: '2024-04-04T11:49:51Z'
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:rbac.authorization.kubernetes.io/autoupdate: {}
f:labels:
.: {}
f:kubernetes.io/bootstrapping: {}
f:rules: {}
selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/system:kube-scheduler
rules:
- verbs:
- create
- patch
- update
apiGroups:
- ''
- events.k8s.io
resources:
- events
- verbs:
- create
apiGroups:
- coordination.k8s.io
resources:
- leases
- verbs:
- get
- update
apiGroups:
- coordination.k8s.io
resources:
- leases
resourceNames:
- kube-scheduler
- verbs:
- get
- list
- watch
apiGroups:
- ''
resources:
- nodes
- verbs:
- delete
- get
- list
- watch
apiGroups:
- ''
resources:
- pods
- verbs:
- create
apiGroups:
- ''
resources:
- bindings
- pods/binding
- verbs:
- patch
- update
apiGroups:
- ''
resources:
- pods/status
- verbs:
- get
- list
- watch
apiGroups:
- ''
resources:
- replicationcontrollers
- services
- verbs:
- get
- list
- watch
apiGroups:
- apps
- extensions
resources:
- replicasets
- verbs:
- get
- list
- watch
apiGroups:
- apps
resources:
- statefulsets
- verbs:
- get
- list
- watch
apiGroups:
- policy
resources:
- poddisruptionbudgets
- verbs:
- get
- list
- watch
apiGroups:
- ''
resources:
- persistentvolumeclaims
- persistentvolumes
- verbs:
- create
apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
- verbs:
- create
apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
- verbs:
- get
- list
- watch
apiGroups:
- storage.k8s.io
resources:
- csinodes
- verbs:
- get
- list
- watch
apiGroups:
- ''
resources:
- namespaces
- verbs:
- get
- list
- watch
apiGroups:
- storage.k8s.io
resources:
- csidrivers
- verbs:
- get
- list
- watch
apiGroups:
- storage.k8s.io
resources:
- csistoragecapacities
What did you expect to happen?
kube-scheduler should work properly.
How can we reproduce it (as minimally and precisely as possible)?
git checkout v2.24.1
docker pull quay.io/kubespray/kubespray:v2.24.1
docker run --rm -it -v "(pwd)/inventory:/inventory" quay.io/kubespray/kubespray:v2.24.1 bash
ansible-playbook -i /inventory/prod/inventory.ini --diff --become cluster.yml -e kube_version=v1.28.6
OS
NAME="Red Hat Enterprise Linux" VERSION="9.3 (Plow)" ID="rhel" ID_LIKE="fedora" VERSION_ID="9.3" PLATFORM_ID="platform:el9" PRETTY_NAME="Red Hat Enterprise Linux 9.3 (Plow)" ANSI_COLOR="0;31" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos" HOME_URL=https://www.redhat.com/ DOCUMENTATION_URL=https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9 BUG_REPORT_URL=https://bugzilla.redhat.com/
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9" REDHAT_BUGZILLA_PRODUCT_VERSION=9.3 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.3"
Version of Ansible
version in quay.io/kubespray/kubespray:v2.24.1
ansible [core 2.15.8] config file = /kubespray/ansible.cfg configured module search path = ['/kubespray/library'] ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections executable location = /usr/local/bin/ansible python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/usr/bin/python3) jinja version = 3.1.2 libyaml = True
Version of Python
version in quay.io/kubespray/kubespray:v2.24.1
Version of Kubespray (commit)
v2.24.1
Network plugin used
cilium
Full inventory with variables
Command used to invoke ansible
ansible-playbook -i /inventory/prod/inventory.ini --diff --become cluster.yml -e kube_version=v1.28.6
Output of ansible run
Anything else we need to know
No response
I have the same error as well
I have the same error too in the same situation.
Would you please run the following commands:
kubectl get cm extension-apiserver-authentication -n kube-system
kubectl describe role extension-apiserver-authentication-reader -n kube-system
kubectl describe rolebindings.rbac.authorization.k8s.io system::extension-apiserver-authentication-reader -n kube-system
Hi @wandersonlima
Sure! Here are the outputs:
>kubectl get cm extension-apiserver-authentication -n kube-system
NAME DATA AGE
extension-apiserver-authentication 6 25d
>kubectl describe role extension-apiserver-authentication-reader -n kube-system
Name: extension-apiserver-authentication-reader
Labels: [kubernetes.io/bootstrapping=rbac-defaults](http://kubernetes.io/bootstrapping=rbac-defaults)
Annotations: [rbac.authorization.kubernetes.io/autoupdate:](http://rbac.authorization.kubernetes.io/autoupdate:) true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
configmaps [] [extension-apiserver-authentication] [get list watch]
> kubectl describe [rolebindings.rbac.authorization.k8s.io](http://rolebindings.rbac.authorization.k8s.io/) system::extension-apiserver-authentication-reader -n kube-system
Name: system::extension-apiserver-authentication-reader
Labels: [kubernetes.io/bootstrapping=rbac-defaults](http://kubernetes.io/bootstrapping=rbac-defaults)
Annotations: [rbac.authorization.kubernetes.io/autoupdate:](http://rbac.authorization.kubernetes.io/autoupdate:) true
Role:
Kind: Role
Name: extension-apiserver-authentication-reader
Subjects:
Kind Name Namespace
---- ---- ---------
User system:kube-controller-manager
User system:kube-scheduler
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.