kuberhealthy icon indicating copy to clipboard operation
kuberhealthy copied to clipboard

Daemonset check tolerates taints, regardless of configured tolerations

Open TBeijen opened this issue 4 years ago • 4 comments

On a dev. cluster we have some Fargate 'nodes'. This causes the daemonset check to fail, in the process causing other alerts because of pods stuck in Pending.

In the helm chart (v53) I see a check.daemonset.tolerations property. However, filling that with a dummy placeholder (to ensure anything being rendered in the khcheck) seems to have no effect. Neither does specifying an extraEnv empty value for ALLOWED_TAINTS.

Am I overlooking something or is this a bug?

Kuberhealthy version: v2.4.0

Logs of deamonset checker pod:

# kubectl -n kuberhealthy logs pod/daemonset-1614167630

time="2021-02-24T11:53:56Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-02-24T11:53:56Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2021-02-24T11:53:56Z" level=info msg="Setting shutdown grace period to: 1m0s"
time="2021-02-24T11:53:56Z" level=info msg="Check deadline in 10m53.015724387s"
time="2021-02-24T11:53:56Z" level=info msg="Parsed POD_NAMESPACE: kuberhealthy"
time="2021-02-24T11:53:56Z" level=info msg="Performing check in kuberhealthy namespace."
time="2021-02-24T11:53:56Z" level=info msg="Setting DS pause container image to: gcr.io/google-containers/pause:3.1"
time="2021-02-24T11:53:56Z" level=info msg="Setting check daemonset name to: daemonset"
time="2021-02-24T11:53:56Z" level=info msg="Setting check priority class name to: "
time="2021-02-24T11:53:56Z" level=info msg="Kubernetes client created."
time="2021-02-24T11:53:56Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-02-24T11:53:56Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-02-24T11:53:56Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2021-02-24T11:53:56Z" level=debug msg="Allowing this check until 2021-02-24 12:04:50 +0000 UTC to finish."
time="2021-02-24T11:53:56Z" level=debug msg="Setting check ctx cancel with timeout 11m53.015724387s"
time="2021-02-24T11:53:56Z" level=info msg="Running daemonset check"
time="2021-02-24T11:53:56Z" level=info msg="Running daemonset deploy..."
time="2021-02-24T11:53:56Z" level=info msg="Deploying daemonset."
time="2021-02-24T11:53:56Z" level=debug msg="runAsUser will be set to  999"
time="2021-02-24T11:53:57Z" level=debug msg="Searching for unique taints on the cluster."
time="2021-02-24T11:53:57Z" level=info msg="Found taints to tolerate: [{eks.amazonaws.com/compute-type  fargate NoSchedule <nil>} {DeletionCandidateOfClusterAutoscaler  1614167605 PreferNoSchedule <nil>}]"
time="2021-02-24T11:53:57Z" level=info msg="Generating daemonset kubernetes spec."
time="2021-02-24T11:53:57Z" level=info msg="Deploying daemonset with tolerations:  [{eks.amazonaws.com/compute-type  fargate NoSchedule <nil>} {DeletionCandidateOfClusterAutoscaler  1614167605 PreferNoSchedule <nil>}]"
time="2021-02-24T11:53:57Z" level=debug msg="Creating Daemonset client."
time="2021-02-24T11:53:57Z" level=debug msg="Worker: waitForPodsToComeOnline started"
time="2021-02-24T11:53:57Z" level=debug msg="Waiting for all ds pods to come online"
time="2021-02-24T11:53:57Z" level=info msg="Timeout set: 10m53.015724387s for all daemonset pods to come online"
time="2021-02-24T11:53:58Z" level=debug msg="Creating Node client."
time="2021-02-24T11:53:58Z" level=debug msg="Creating Pod client."

<after a while repeating:>

time="2021-02-24T11:54:05Z" level=debug msg="Creating Node client."
time="2021-02-24T11:54:05Z" level=debug msg="Creating Pod client."
time="2021-02-24T11:54:05Z" level=info msg="DaemonsetChecker: Daemonset check waiting for 4 pod(s) to come up on nodes [fargate-ip-10-11-70-187.eu-west-1.compute.internal fargate-ip-10-11-198-166.eu-west-1.compute.internal fargate-ip-10-11-166-31.eu-west-1.compute.internal fargate-ip-10-11-140-81.eu-west-1.compute.internal]"

Khcheck:

# kubectl -n kuberhealthy get khcheck daemonset -o yaml
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
  annotations:
    meta.helm.sh/release-name: kuberhealthy
    meta.helm.sh/release-namespace: kuberhealthy
  creationTimestamp: "2020-11-30T05:26:00Z"
  generation: 4
  labels:
    app.kubernetes.io/managed-by: Helm
  name: daemonset
  namespace: kuberhealthy
  resourceVersion: "121799350"
  selfLink: /apis/comcast.github.io/v1/namespaces/kuberhealthy/khchecks/daemonset
  uid: 8830b311-32cc-11eb-8a72-0a5fc8dc502f
spec:
  podSpec:
    containers:
    - env:
      - name: POD_NAMESPACE
        valueFrom:
          fieldRef:
            fieldPath: metadata.namespace
      - name: CHECK_POD_TIMEOUT
        value: 10m
      - name: ALLOWED_TAINTS
        value: ""
      image: kuberhealthy/daemonset-check:v3.2.5
      imagePullPolicy: IfNotPresent
      name: main
      resources:
        requests:
          cpu: 10m
          memory: 50Mi
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
    securityContext:
      fsGroup: 999
      runAsUser: 999
    serviceAccountName: daemonset-khcheck
    tolerations:
    - effect: NoSchedule
      key: key
      operator: Equal
      value: value
  runInterval: 15m
  timeout: 12m

Created daemonset:

# kubectl -n kuberhealthy get daemonset.apps/daemonset-daemonset-1614167630-1614167636 -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
  creationTimestamp: "2021-02-24T11:53:57Z"
  generation: 1
  labels:
    checkRunTime: "1614167636"
    creatingInstance: daemonset-1614167630
    kh-app: daemonset-daemonset-1614167630-1614167636
    khcheck: daemonset
    source: kuberhealthy
  name: daemonset-daemonset-1614167630-1614167636
  namespace: kuberhealthy
  ownerReferences:
  - apiVersion: v1
    kind: Pod
    name: daemonset-1614167630
    uid: e5ab11ad-d96c-4074-afcc-326ebbbb235a
  resourceVersion: "121800054"
  selfLink: /apis/apps/v1/namespaces/kuberhealthy/daemonsets/daemonset-daemonset-1614167630-1614167636
  uid: 47b6abe7-fb90-42d7-bdb0-38ddcd020d87
spec:
  minReadySeconds: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      checkRunTime: "1614167636"
      creatingInstance: daemonset-1614167630
      kh-app: daemonset-daemonset-1614167630-1614167636
      khcheck: daemonset
      source: kuberhealthy
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      creationTimestamp: null
      labels:
        checkRunTime: "1614167636"
        creatingInstance: daemonset-1614167630
        kh-app: daemonset-daemonset-1614167630-1614167636
        khcheck: daemonset
        source: kuberhealthy
      name: daemonset-daemonset-1614167630-1614167636
      ownerReferences:
      - apiVersion: v1
        kind: Pod
        name: daemonset-1614167630
        uid: e5ab11ad-d96c-4074-afcc-326ebbbb235a
    spec:
      containers:
      - image: gcr.io/google-containers/pause:3.1
        imagePullPolicy: IfNotPresent
        name: sleep
        resources:
          requests:
            cpu: "0"
            memory: "0"
        securityContext:
          runAsUser: 999
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoSchedule
        key: eks.amazonaws.com/compute-type
        value: fargate
      - effect: PreferNoSchedule
        key: DeletionCandidateOfClusterAutoscaler
        value: "1614167605"
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 11
  desiredNumberScheduled: 11
  numberAvailable: 7
  numberMisscheduled: 0
  numberReady: 7
  numberUnavailable: 4
  observedGeneration: 1
  updatedNumberScheduled: 11

TBeijen avatar Feb 24 '21 12:02 TBeijen

Just to clarify, is your desired effect to not having anything schedule on your fargate nodes?

rjacks161 avatar Feb 25 '21 17:02 rjacks161

Just to clarify, is your desired effect to not having anything schedule on your fargate nodes?

Yes. It's not even possible. EKS Fargate spawns a 'node' per pod that's not a full-featured node. It doesn't support Daemonsets at all and you also can't schedule other pods on it.

So in the case of Fargate this can never work due to the nature of Fargate. But in general I can imagine having fine-grained control over the tolerations (and node selectors, etc.) the daemonset check applies can be useful. E.g. to split the daemonset check into variants, targeting specific node groups.

TBeijen avatar Feb 26 '21 07:02 TBeijen

This may be best solved with a new check: #777

integrii avatar Mar 03 '21 17:03 integrii

Am facing the same issue as @TBeijen. Not possible to add tolerations and nodeScheduler.

In my cluster, there are Linux and Windows nodes. I don't want to run the Daemonset check on Windows nodes as the pods goes to ImagePullBackOff error. But unable to add a toleration or nodeSelector to the Daemonset check. By default it picks up all node taints and adds them in tolerations.

susmitaganguli avatar Mar 08 '22 17:03 susmitaganguli

I'm also seeing this behavior, and it appears to due to the fact that:

  • It no explicit TOLERATIONS are passed into the daemonset check, it will add all existing node taints: https://github.com/kuberhealthy/kuberhealthy/blob/master/cmd/daemonset-check/run_check.go#L328
  • This TOLERATIONS can only be controlled via an environment variable passed to the khcheck itself - not the top-level tolerations parameter passed to kuberhealthy itself.

I've found that if you pass in some (any) dummy toleration value into the khcheck, like this:

check:
  daemonset:
    enabled: true
    daemonset:
      extraEnvs:
        TOLERATIONS: "dummytoleration=foo"

It will prevent this default behavior, and specify your "dummytoleration=foo" toleration (which looks odds, but acts as a harmless no-op) on the daemonset instead.

dansimone avatar Dec 08 '22 14:12 dansimone

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment on the issue or this will be closed in 15 days.

github-actions[bot] avatar Jan 30 '24 00:01 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity. Please reopen and comment on the issue if you believe it should stay open.

github-actions[bot] avatar Feb 21 '24 00:02 github-actions[bot]

Thank you for the helpful comment, @dansimone. I greatly appreciate it. Would like to see if that behavior can be fixed and if tolerations can be used under podSpec easily.

Until then, TOLERATIONS: "dummytoleration=foo:NoExecute" or TOLERATIONS: "dummytoleration=foo:NoSchedule" are also options in case someone is looking to add effect on those tolerations. ~~This fixed my issue to prevent Pending pods that are scheduled on Fargate nodes. ~~

Edit: Tolerations prevent to creation of pods on Fargate nodes but khchecks are still failing due to Fargate nodes.

acaremrullah avatar Mar 27 '24 23:03 acaremrullah