kuberhealthy
kuberhealthy copied to clipboard
Daemonset check tolerates taints, regardless of configured tolerations
On a dev. cluster we have some Fargate 'nodes'. This causes the daemonset check to fail, in the process causing other alerts because of pods stuck in Pending.
In the helm chart (v53) I see a check.daemonset.tolerations property. However, filling that with a dummy placeholder (to ensure anything being rendered in the khcheck) seems to have no effect. Neither does specifying an extraEnv empty value for ALLOWED_TAINTS.
Am I overlooking something or is this a bug?
Kuberhealthy version: v2.4.0
Logs of deamonset checker pod:
# kubectl -n kuberhealthy logs pod/daemonset-1614167630
time="2021-02-24T11:53:56Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-02-24T11:53:56Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2021-02-24T11:53:56Z" level=info msg="Setting shutdown grace period to: 1m0s"
time="2021-02-24T11:53:56Z" level=info msg="Check deadline in 10m53.015724387s"
time="2021-02-24T11:53:56Z" level=info msg="Parsed POD_NAMESPACE: kuberhealthy"
time="2021-02-24T11:53:56Z" level=info msg="Performing check in kuberhealthy namespace."
time="2021-02-24T11:53:56Z" level=info msg="Setting DS pause container image to: gcr.io/google-containers/pause:3.1"
time="2021-02-24T11:53:56Z" level=info msg="Setting check daemonset name to: daemonset"
time="2021-02-24T11:53:56Z" level=info msg="Setting check priority class name to: "
time="2021-02-24T11:53:56Z" level=info msg="Kubernetes client created."
time="2021-02-24T11:53:56Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-02-24T11:53:56Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-02-24T11:53:56Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2021-02-24T11:53:56Z" level=debug msg="Allowing this check until 2021-02-24 12:04:50 +0000 UTC to finish."
time="2021-02-24T11:53:56Z" level=debug msg="Setting check ctx cancel with timeout 11m53.015724387s"
time="2021-02-24T11:53:56Z" level=info msg="Running daemonset check"
time="2021-02-24T11:53:56Z" level=info msg="Running daemonset deploy..."
time="2021-02-24T11:53:56Z" level=info msg="Deploying daemonset."
time="2021-02-24T11:53:56Z" level=debug msg="runAsUser will be set to 999"
time="2021-02-24T11:53:57Z" level=debug msg="Searching for unique taints on the cluster."
time="2021-02-24T11:53:57Z" level=info msg="Found taints to tolerate: [{eks.amazonaws.com/compute-type fargate NoSchedule <nil>} {DeletionCandidateOfClusterAutoscaler 1614167605 PreferNoSchedule <nil>}]"
time="2021-02-24T11:53:57Z" level=info msg="Generating daemonset kubernetes spec."
time="2021-02-24T11:53:57Z" level=info msg="Deploying daemonset with tolerations: [{eks.amazonaws.com/compute-type fargate NoSchedule <nil>} {DeletionCandidateOfClusterAutoscaler 1614167605 PreferNoSchedule <nil>}]"
time="2021-02-24T11:53:57Z" level=debug msg="Creating Daemonset client."
time="2021-02-24T11:53:57Z" level=debug msg="Worker: waitForPodsToComeOnline started"
time="2021-02-24T11:53:57Z" level=debug msg="Waiting for all ds pods to come online"
time="2021-02-24T11:53:57Z" level=info msg="Timeout set: 10m53.015724387s for all daemonset pods to come online"
time="2021-02-24T11:53:58Z" level=debug msg="Creating Node client."
time="2021-02-24T11:53:58Z" level=debug msg="Creating Pod client."
<after a while repeating:>
time="2021-02-24T11:54:05Z" level=debug msg="Creating Node client."
time="2021-02-24T11:54:05Z" level=debug msg="Creating Pod client."
time="2021-02-24T11:54:05Z" level=info msg="DaemonsetChecker: Daemonset check waiting for 4 pod(s) to come up on nodes [fargate-ip-10-11-70-187.eu-west-1.compute.internal fargate-ip-10-11-198-166.eu-west-1.compute.internal fargate-ip-10-11-166-31.eu-west-1.compute.internal fargate-ip-10-11-140-81.eu-west-1.compute.internal]"
Khcheck:
# kubectl -n kuberhealthy get khcheck daemonset -o yaml
apiVersion: comcast.github.io/v1
kind: KuberhealthyCheck
metadata:
annotations:
meta.helm.sh/release-name: kuberhealthy
meta.helm.sh/release-namespace: kuberhealthy
creationTimestamp: "2020-11-30T05:26:00Z"
generation: 4
labels:
app.kubernetes.io/managed-by: Helm
name: daemonset
namespace: kuberhealthy
resourceVersion: "121799350"
selfLink: /apis/comcast.github.io/v1/namespaces/kuberhealthy/khchecks/daemonset
uid: 8830b311-32cc-11eb-8a72-0a5fc8dc502f
spec:
podSpec:
containers:
- env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CHECK_POD_TIMEOUT
value: 10m
- name: ALLOWED_TAINTS
value: ""
image: kuberhealthy/daemonset-check:v3.2.5
imagePullPolicy: IfNotPresent
name: main
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
securityContext:
fsGroup: 999
runAsUser: 999
serviceAccountName: daemonset-khcheck
tolerations:
- effect: NoSchedule
key: key
operator: Equal
value: value
runInterval: 15m
timeout: 12m
Created daemonset:
# kubectl -n kuberhealthy get daemonset.apps/daemonset-daemonset-1614167630-1614167636 -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2021-02-24T11:53:57Z"
generation: 1
labels:
checkRunTime: "1614167636"
creatingInstance: daemonset-1614167630
kh-app: daemonset-daemonset-1614167630-1614167636
khcheck: daemonset
source: kuberhealthy
name: daemonset-daemonset-1614167630-1614167636
namespace: kuberhealthy
ownerReferences:
- apiVersion: v1
kind: Pod
name: daemonset-1614167630
uid: e5ab11ad-d96c-4074-afcc-326ebbbb235a
resourceVersion: "121800054"
selfLink: /apis/apps/v1/namespaces/kuberhealthy/daemonsets/daemonset-daemonset-1614167630-1614167636
uid: 47b6abe7-fb90-42d7-bdb0-38ddcd020d87
spec:
minReadySeconds: 2
revisionHistoryLimit: 10
selector:
matchLabels:
checkRunTime: "1614167636"
creatingInstance: daemonset-1614167630
kh-app: daemonset-daemonset-1614167630-1614167636
khcheck: daemonset
source: kuberhealthy
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
creationTimestamp: null
labels:
checkRunTime: "1614167636"
creatingInstance: daemonset-1614167630
kh-app: daemonset-daemonset-1614167630-1614167636
khcheck: daemonset
source: kuberhealthy
name: daemonset-daemonset-1614167630-1614167636
ownerReferences:
- apiVersion: v1
kind: Pod
name: daemonset-1614167630
uid: e5ab11ad-d96c-4074-afcc-326ebbbb235a
spec:
containers:
- image: gcr.io/google-containers/pause:3.1
imagePullPolicy: IfNotPresent
name: sleep
resources:
requests:
cpu: "0"
memory: "0"
securityContext:
runAsUser: 999
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 1
tolerations:
- effect: NoSchedule
key: eks.amazonaws.com/compute-type
value: fargate
- effect: PreferNoSchedule
key: DeletionCandidateOfClusterAutoscaler
value: "1614167605"
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 11
desiredNumberScheduled: 11
numberAvailable: 7
numberMisscheduled: 0
numberReady: 7
numberUnavailable: 4
observedGeneration: 1
updatedNumberScheduled: 11
Just to clarify, is your desired effect to not having anything schedule on your fargate nodes?
Just to clarify, is your desired effect to not having anything schedule on your fargate nodes?
Yes. It's not even possible. EKS Fargate spawns a 'node' per pod that's not a full-featured node. It doesn't support Daemonsets at all and you also can't schedule other pods on it.
So in the case of Fargate this can never work due to the nature of Fargate. But in general I can imagine having fine-grained control over the tolerations (and node selectors, etc.) the daemonset check applies can be useful. E.g. to split the daemonset check into variants, targeting specific node groups.
This may be best solved with a new check: #777
Am facing the same issue as @TBeijen. Not possible to add tolerations and nodeScheduler.
In my cluster, there are Linux and Windows nodes. I don't want to run the Daemonset check on Windows nodes as the pods goes to ImagePullBackOff error. But unable to add a toleration or nodeSelector to the Daemonset check. By default it picks up all node taints and adds them in tolerations.
I'm also seeing this behavior, and it appears to due to the fact that:
- It no explicit
TOLERATIONSare passed into the daemonset check, it will add all existing node taints: https://github.com/kuberhealthy/kuberhealthy/blob/master/cmd/daemonset-check/run_check.go#L328 - This
TOLERATIONScan only be controlled via an environment variable passed to the khcheck itself - not the top-leveltolerationsparameter passed to kuberhealthy itself.
I've found that if you pass in some (any) dummy toleration value into the khcheck, like this:
check:
daemonset:
enabled: true
daemonset:
extraEnvs:
TOLERATIONS: "dummytoleration=foo"
It will prevent this default behavior, and specify your "dummytoleration=foo" toleration (which looks odds, but acts as a harmless no-op) on the daemonset instead.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment on the issue or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity. Please reopen and comment on the issue if you believe it should stay open.
Thank you for the helpful comment, @dansimone. I greatly appreciate it. Would like to see if that behavior can be fixed and if tolerations can be used under podSpec easily.
Until then, TOLERATIONS: "dummytoleration=foo:NoExecute" or TOLERATIONS: "dummytoleration=foo:NoSchedule" are also options in case someone is looking to add effect on those tolerations. ~~This fixed my issue to prevent Pending pods that are scheduled on Fargate nodes. ~~
Edit: Tolerations prevent to creation of pods on Fargate nodes but khchecks are still failing due to Fargate nodes.