Daniel Kłobuszewski
Daniel Kłobuszewski
/reopen /remove-lifecycle rotten
I wonder if using NoExecute taint effect instead of NoSchedule would be sufficient to fix this. Perhaps with some configurable delay between tainting and actually removing the node.
Today it applies NoSchedule taint, manually evicts pods using eviction API and then deletes the node. In case of empty nodes, it just applies the taint and deletes the node...
Ok, I think with that this becomes a fairly well-defined task, let's see if someone would be able to pick it up. Hopefully the change can be confined just to...
Hi @jan-skarupa, thanks for looking into this! Interesting, so it looks like VM deletion is just causing OS to send SIGTERM to kubelet, which then [initiates graceful shutdown in 1.21+...
To actually avoid the race condition at all, we would have to separate tainting from drain&deletion. That would require changes to ScaleDown interface so that Actuator would have two separate...
@jan-skarupa are you up for this? It is definitely a bigger change than just adding `NoExecute` taint.
I talked offline about this with @MaciekPytel. The conclusion we came up with was that it should be both simpler and less risky if Actuator treat empty node becoming non-empty...
I think extending the API would be much cleaner, but the need to implement it for all cloud providers calls for a broader discussion. I added this topic to SIG...