karpenter
karpenter copied to clipboard
Karpenter and kube-scheduler Hangs for ~5m when Deleting a StatefulSet PVC
Description
Observed Behavior:
I'm testing the behavior of #1018 and validating how Karpenter handles node rolling when the Node Cleanup Controller is responsible for deleting PVCs and PVs from a node that is using NVME storage. The behavior that I'm seeing is that Karpenter is rolling the pods onto the new nodes, but when the new nodes come up, it takes the node cleanup controller a little bit of time to fully delete the PVCs/PVs from the old node (since it's doing the cleanup on a poll).
When it finally does delete the PVCs, Karpenter begins reporting the following error: "ignoring pod, getting persistent volume claim \"www-web-1\", PersistentVolumeClaim \"www-web-1\" not found"
. It does so correctly and should do so until the pod gets a replacement PVC from the stateful set controller. However, once the PVC is re-created by the statefulset controller, Karpenter still doesn't see that this pod is schedulable and waits around for ~5m before it begins launching a pod for the node.
There is no log on Karpenter's side that indicates why it is ignoring the pod or why it is not launching a node.
{"level":"DEBUG","time":"2024-02-19T20:11:02.842Z","logger":"controller.disruption","message":"ignoring pod, getting persistent volume claim \"www-web-2\", PersistentVolumeClaim \"www-web-2\" not found","commit":"3a8dd3a","pod":"default/web-2"}
{"level":"DEBUG","time":"2024-02-19T20:11:02.864Z","logger":"controller.disruption","message":"ignoring pod, getting persistent volume claim \"www-web-2\", PersistentVolumeClaim \"www-web-2\" not found","commit":"3a8dd3a","pod":"default/web-2"}
{"level":"DEBUG","time":"2024-02-19T20:11:04.752Z","logger":"controller.provisioner","message":"ignoring pod, getting persistent volume claim \"www-web-2\", PersistentVolumeClaim \"www-web-2\" not found","commit":"3a8dd3a","pod":"default/web-2"}
{"level":"DEBUG","time":"2024-02-19T20:11:14.752Z","logger":"controller.provisioner","message":"ignoring pod, getting persistent volume claim \"www-web-2\", PersistentVolumeClaim \"www-web-2\" not found","commit":"3a8dd3a","pod":"default/web-2"}
{"level":"DEBUG","time":"2024-02-19T20:11:43.031Z","logger":"controller.disruption","message":"discovered subnets","commit":"3a8dd3a","subnets":["subnet-006396cb4eae056f1 (us-west-2b)","subnet-0c31672d8f3da27ee (us-west-2b)","subnet-0e8ac99bc70aa8051 (us-west-2d)","subnet-068b2a87ec024682c (us-west-2d)","subnet-0f03dba293533e055 (us-west-2a)","subnet-0ff6eb107cd458d60 (us-west-2a)"]}
{"level":"DEBUG","time":"2024-02-19T20:11:49.218Z","logger":"controller","message":"deleted launch template","commit":"3a8dd3a","id":"lt-0747043e0ede53f50","name":"karpenter.k8s.aws/2974937103651069208"}
{"level":"ERROR","time":"2024-02-19T20:12:54.413Z","logger":"controller.nodeclaim.consistency","message":"check failed, expected 234 of resource pods, but found 110 (47.0% of expected)","commit":"3a8dd3a","nodeclaim":"default-wcpwn"}
{"level":"INFO","time":"2024-02-19T20:16:24.258Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"3a8dd3a","pods":"default/web-2","duration":"5.775834ms"}
Expected Behavior:
Karpenter should react to schedule the pod as soon as the pod shows that it can be schedulable.
Reproduction Steps (Please include YAML):
Versions:
- Chart Version: v0.34.0
- Kubernetes Version (
kubectl version
): 1.29
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment