descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

descheduler can not create a event

Open tomsunyu opened this issue 2 years ago • 1 comments

I found the following problem:

Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"doa-front-uat-5778f5976-92ffr.171e2f32bc5871c7", GenerateName:"", Namespace:"doa", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"doa", Name:"doa-front-uat-5778f5976-92ffr", UID:"7d167e26-a19e-4a72-b594-b277c6f2972a", APIVersion:"v1", ResourceVersion:"301934531", FieldPath:""}, Reason:"Descheduled", Message:"pod evicted by sigs.k8s.io/deschedulerLowNodeUtilization", Source:v1.EventSource{Component:"sigs.k8s.io.descheduler", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0cab801c7a7ebc7, ext:2403328205, loc:(*time.Location)(0x2c75240)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0cab801c7a7ebc7, ext:2403328205, loc:(*time.Location)(0x2c75240)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:kube-system:descheduler-sa" cannot create resource "events" in API group "" in the namespace "doa"' (will not retry!)

I have applied the rbac.yaml

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: descheduler-cluster-role
rules:
- apiGroups: ["events.k8s.io"]
  resources: ["events"]
  verbs: ["create", "update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "watch", "list"]
- apiGroups: [""]
  resources: ["namespaces"]
  verbs: ["get", "watch", "list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list", "delete"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs: ["create"]
- apiGroups: ["scheduling.k8s.io"]
  resources: ["priorityclasses"]
  verbs: ["get", "watch", "list"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["create"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  resourceNames: ["descheduler"]
  verbs: ["get", "patch", "delete"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: descheduler-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: descheduler-cluster-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: descheduler-cluster-role
subjects:
  - name: descheduler-sa
    kind: ServiceAccount
    namespace: kube-system

So how to solve the problem ?

tomsunyu avatar Oct 15 '22 08:10 tomsunyu

Hi @tomsunyu, which version of descheduler and k8s are you running? Also, are you installing it with helm or manually?

damemi avatar Oct 16 '22 18:10 damemi

1.24.6, 0.25.1, installed with helm (rbac and serviceaccount values untouched)

4c74356b41 avatar Oct 17 '22 07:10 4c74356b41

actually if I update default clusterrole with this:

- apiGroups: ["events.k8s.io"]
  resources: ["events"]
  verbs: ["create", "update"]

it starts to work, so your helm chart has a bug

4c74356b41 avatar Oct 17 '22 08:10 4c74356b41

My k8s version is v1.21.0,I have updated the rbac.yaml, and descheduler has no error.

- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "update"]

tomsunyu avatar Oct 17 '22 08:10 tomsunyu

good for you! a solid middleground would work:

- apiGroups: ["events.k8s.io",""]
  resources: ["events"]
  verbs: ["create", "update"]

4c74356b41 avatar Oct 17 '22 10:10 4c74356b41

The default clusterrole in v0.25.1 does have the events permission that @tomsunyu mentioned above

@tomsunyu the rbac in your original post looks different than the one we provide (it uses events.k8s.io), did you also install using our helm chart?

damemi avatar Oct 17 '22 12:10 damemi

@damemi but they don't work :) https://stackoverflow.com/a/69290088/6067741

4c74356b41 avatar Oct 17 '22 12:10 4c74356b41

@4c74356b41 thank you, what I'm trying to solve is why this reports different errors for different people (see https://github.com/kubernetes-sigs/descheduler/issues/959).

Looks like https://github.com/kubernetes-sigs/descheduler/commit/0aa233415e6834ec063d5ca71d9faa2f5c790f87#diff-02ecb8f9f97d49abacef24f8029abd43aba6605915e3004da0954e8adcdbac6fL7 updated the raw RBAC file to use the new events.k8s.io but not the helm chart

@tomsunyu, based on your RBAC values it looks like you're using descheduler v0.25+ (or at least the RBAC/helm chart role from the v0.25 release), which is not supported on k8s 1.21. Please use v0.23.1 at the latest, which has the matching roles that fixed your problem:

  • https://github.com/kubernetes-sigs/descheduler/blob/v0.23.1/kubernetes/base/rbac.yaml#L7
  • https://github.com/kubernetes-sigs/descheduler/blob/v0.23.1/charts/descheduler/templates/clusterrole.yaml#L9

@4c74356b41, I opened https://github.com/kubernetes-sigs/descheduler/pull/990 and https://github.com/kubernetes-sigs/descheduler/pull/989 to update the helm chart to use the right group. This should be chart v0.25.2 once the branch PR merges. From what I can tell, we don't need both api groups, just a consistent role across the manifests we provide.

damemi avatar Oct 17 '22 13:10 damemi

OK, I will test again.

tomsunyu avatar Oct 18 '22 00:10 tomsunyu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 16 '23 01:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 15 '23 02:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 17 '23 02:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 17 '23 02:03 k8s-ci-robot