kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

kube-scheduler doesn't work properly after reboot

Open akoken opened this issue 9 months ago • 6 comments

What happened?

I successfully installed a Kubernetes cluster on my RHEL servers. However, kube-scheduler does not work properly after rebooting the master node. It doesn't clean up completed jobs and terminated pods. I installed two different clusters, and both of them have the same issue. kube-scheduler logs show that it cannot access some resources, but system:kube-scheduler looks good to me, though.

kube-scheduler logs
I0426 12:28:37.152855       1 serving.go:348] Generated self-signed cert in-memory
W0426 12:28:39.213496       1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
W0426 12:28:39.213573       1 authentication.go:368] Error looking up in-cluster authentication configuration: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
W0426 12:28:39.213634       1 authentication.go:369] Continuing without authentication configuration. This may treat all requests as anonymous.
W0426 12:28:39.213663       1 authentication.go:370] To require authentication configuration lookup to succeed, set --authentication-tolerate-lookup-failure=false
I0426 12:28:39.243027       1 server.go:154] "Starting Kubernetes Scheduler" version="v1.28.6"
I0426 12:28:39.243268       1 server.go:156] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0426 12:28:39.245677       1 secure_serving.go:213] Serving securely on [::]:10259
I0426 12:28:39.245853       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0426 12:28:39.245915       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0426 12:28:39.245972       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0426 12:28:39.250049       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope
W0426 12:28:39.250066       1 reflector.go:535] pkg/server/dynamiccertificates/configmap_cafile_content.go:206: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
E0426 12:28:39.250101       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope
E0426 12:28:39.250133       1 reflector.go:147] pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps "extension-apiserver-authentication" is forbidden: User "system:kube-scheduler" cannot list resource "configmaps" in API group "" in the namespace "kube-system"
W0426 12:28:39.253131       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope
W0426 12:28:39.253450       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.253593       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSINode: failed to list *v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
W0426 12:28:39.253458       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope
E0426 12:28:39.253754       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.ReplicaSet: failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope
W0426 12:28:39.253273       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.253807       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
W0426 12:28:39.253292       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope
E0426 12:28:39.253848       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope
W0426 12:28:39.253307       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0426 12:28:39.253890       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0426 12:28:39.253370       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope
E0426 12:28:39.253928       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope
W0426 12:28:39.253386       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.254033       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIStorageCapacity: failed to list *v1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
W0426 12:28:39.253398       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Namespace: namespaces is forbidden: User "system:kube-scheduler" cannot list resource "namespaces" in API group "" at the cluster scope
E0426 12:28:39.254071       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:kube-scheduler" cannot list resource "namespaces" in API group "" at the cluster scope
W0426 12:28:39.253458       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope
E0426 12:28:39.254167       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.ReplicationController: failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope
E0426 12:28:39.253473       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.PersistentVolume: failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope
W0426 12:28:39.253160       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope
E0426 12:28:39.254268       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope
W0426 12:28:39.253499       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope
E0426 12:28:39.254315       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.PersistentVolumeClaim: failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope
W0426 12:28:39.254839       1 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
E0426 12:28:39.254868       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.StorageClass: failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
system:kube-scheduler

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:kube-scheduler
  uid: 8a2cce65-9058-48a3-b12f-29bab10f403d
  resourceVersion: '103'
  creationTimestamp: '2024-04-04T11:49:51Z'
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: 'true'
  managedFields:
    - manager: kube-apiserver
      operation: Update
      apiVersion: rbac.authorization.k8s.io/v1
      time: '2024-04-04T11:49:51Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:rbac.authorization.kubernetes.io/autoupdate: {}
          f:labels:
            .: {}
            f:kubernetes.io/bootstrapping: {}
        f:rules: {}
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/system:kube-scheduler
rules:
  - verbs:
      - create
      - patch
      - update
    apiGroups:
      - ''
      - events.k8s.io
    resources:
      - events
  - verbs:
      - create
    apiGroups:
      - coordination.k8s.io
    resources:
      - leases
  - verbs:
      - get
      - update
    apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    resourceNames:
      - kube-scheduler
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - ''
    resources:
      - nodes
  - verbs:
      - delete
      - get
      - list
      - watch
    apiGroups:
      - ''
    resources:
      - pods
  - verbs:
      - create
    apiGroups:
      - ''
    resources:
      - bindings
      - pods/binding
  - verbs:
      - patch
      - update
    apiGroups:
      - ''
    resources:
      - pods/status
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - ''
    resources:
      - replicationcontrollers
      - services
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - apps
      - extensions
    resources:
      - replicasets
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - apps
    resources:
      - statefulsets
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - ''
    resources:
      - persistentvolumeclaims
      - persistentvolumes
  - verbs:
      - create
    apiGroups:
      - authentication.k8s.io
    resources:
      - tokenreviews
  - verbs:
      - create
    apiGroups:
      - authorization.k8s.io
    resources:
      - subjectaccessreviews
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - storage.k8s.io
    resources:
      - csinodes
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - ''
    resources:
      - namespaces
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - storage.k8s.io
    resources:
      - csidrivers
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - storage.k8s.io
    resources:
      - csistoragecapacities

What did you expect to happen?

kube-scheduler should work properly.

How can we reproduce it (as minimally and precisely as possible)?

git checkout v2.24.1
docker pull quay.io/kubespray/kubespray:v2.24.1
docker run --rm -it  -v "(pwd)/inventory:/inventory" quay.io/kubespray/kubespray:v2.24.1 bash

ansible-playbook -i /inventory/prod/inventory.ini --diff --become cluster.yml -e kube_version=v1.28.6

OS

NAME="Red Hat Enterprise Linux" VERSION="9.3 (Plow)" ID="rhel" ID_LIKE="fedora" VERSION_ID="9.3" PLATFORM_ID="platform:el9" PRETTY_NAME="Red Hat Enterprise Linux 9.3 (Plow)" ANSI_COLOR="0;31" LOGO="fedora-logo-icon" CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos" HOME_URL=https://www.redhat.com/ DOCUMENTATION_URL=https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9 BUG_REPORT_URL=https://bugzilla.redhat.com/

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9" REDHAT_BUGZILLA_PRODUCT_VERSION=9.3 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

Version of Ansible

version in quay.io/kubespray/kubespray:v2.24.1

ansible [core 2.15.8] config file = /kubespray/ansible.cfg configured module search path = ['/kubespray/library'] ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections executable location = /usr/local/bin/ansible python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/usr/bin/python3) jinja version = 3.1.2 libyaml = True

Version of Python

version in quay.io/kubespray/kubespray:v2.24.1

Version of Kubespray (commit)

v2.24.1

Network plugin used

cilium

Full inventory with variables

Command used to invoke ansible

ansible-playbook -i /inventory/prod/inventory.ini --diff --become cluster.yml -e kube_version=v1.28.6

Output of ansible run

Anything else we need to know

No response

akoken avatar Apr 29 '24 09:04 akoken

I have the same error as well

hcank avatar Apr 29 '24 12:04 hcank

I have the same error too in the same situation.

alperbasay avatar Apr 29 '24 12:04 alperbasay

Would you please run the following commands:

kubectl get cm extension-apiserver-authentication -n kube-system
kubectl describe role extension-apiserver-authentication-reader -n kube-system
kubectl describe rolebindings.rbac.authorization.k8s.io system::extension-apiserver-authentication-reader -n kube-system

wandersonlima avatar Apr 29 '24 13:04 wandersonlima

Hi @wandersonlima

Sure! Here are the outputs:

>kubectl get cm extension-apiserver-authentication -n kube-system
NAME                                 DATA   AGE
extension-apiserver-authentication   6      25d
 
>kubectl describe role extension-apiserver-authentication-reader -n kube-system
Name:         extension-apiserver-authentication-reader
Labels:       [kubernetes.io/bootstrapping=rbac-defaults](http://kubernetes.io/bootstrapping=rbac-defaults)
Annotations:  [rbac.authorization.kubernetes.io/autoupdate:](http://rbac.authorization.kubernetes.io/autoupdate:) true
PolicyRule:
  Resources   Non-Resource URLs  Resource Names                        Verbs
  ---------   -----------------  --------------                        -----
  configmaps  []                 [extension-apiserver-authentication]  [get list watch]
 
> kubectl describe [rolebindings.rbac.authorization.k8s.io](http://rolebindings.rbac.authorization.k8s.io/) system::extension-apiserver-authentication-reader -n kube-system
Name:         system::extension-apiserver-authentication-reader
Labels:       [kubernetes.io/bootstrapping=rbac-defaults](http://kubernetes.io/bootstrapping=rbac-defaults)
Annotations:  [rbac.authorization.kubernetes.io/autoupdate:](http://rbac.authorization.kubernetes.io/autoupdate:) true
Role:
  Kind:  Role
  Name:  extension-apiserver-authentication-reader
Subjects:
  Kind  Name                            Namespace
  ----  ----                            ---------
  User  system:kube-controller-manager
  User  system:kube-scheduler

akoken avatar Apr 29 '24 13:04 akoken

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 28 '24 14:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 27 '24 14:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Sep 26 '24 15:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 26 '24 15:09 k8s-ci-robot