moon
moon copied to clipboard
Moon browsers are blocking gke cluster scale down
Hi there,
recently we are getting warnings from GKE about moon browser pods blocking cluster scale down:
"Pod is blocking scale down because it’s not backed by a controller"
{
"insertId": "96beaab8-e3c3-4773-a733-f9559c041b0f@a1",
"jsonPayload": {
"noDecisionStatus": {
"measureTime": "1643992124",
"noScaleDown": {
"nodes": [
{
"node": {
"mig": {
"zone": "europe-west4-b",
"name": "gke-hybris-cluster-n-hybris-node-pool-702525f4-grp",
"nodepool": "hybris-node-pool-nonprod"
},
"cpuRatio": 43,
"name": "gke-hybris-cluster-n-hybris-node-pool-702525f4-cqjm",
"memRatio": 43
},
"reason": {
"parameters": [
"chrome-95-0-7d3784d7-9ccd-4510-8006-a37f60eb7c22"
],
"messageId": "no.scale.down.node.pod.not.backed.by.controller"
}
}
],
"nodesTotalCount": 1
}
}
},
"resource": {
"type": "k8s_cluster",
"labels": {
"cluster_name": "hybris-cluster-nonprod",
"location": "europe-west4",
"project_id": "hybris-prod-0815"
}
},
"timestamp": "2022-02-04T16:28:44.919959722Z",
"logName": "projects/hybris-prod-0815/logs/container.googleapis.com%2Fcluster-autoscaler-visibility",
"receiveTimestamp": "2022-02-04T16:28:45.608040844Z"
}
We can try fixing this with an annotation like cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
but maybe there is a better solution?
Hmm ok I see: The real problem seems to be some stuck browser pods ... the logs of this pods are containing an endless list of Waiting X server...
entries.
@mhubig you have to check defender
container logs of such pods. Usually this could be because of DNS issue or Kubernetes API overload.