kube-aws-autoscaler Scale down: implement scoring logic to select instance to terminate

Idea: the autoscaler could potentially make an informed decision on what EC2 instance to terminate and use https://docs.aws.amazon.com/AutoScaling/latest/APIReference/API_TerminateInstanceInAutoScalingGroup.html to terminate the selected node. The node selection could be kind of a scoring function to honor attached storage (EBS) and critical pods. The node with the lowest score should be selected for termination, e.g.:

add 1 to the score for every pod running on the node
add 10 to the score for every critical pod running on the node
add 10 to the score for every attached volume (EBS)

This is just an idea and also has the drawback of bringing additional complexity where we theoretically should treat all nodes equal.

Feb 08 '17 18:02 hjacobs

Something to consider: having such a scoring function might lead to "hot spot" nodes accumulating pods/volumes over time (and therefore getting higher scores and preventing node termination). This might lead to less balanced cluster utilization and nodes not getting recycled.

Feb 08 '17 18:02 hjacobs

The design proposal for rescheduling critical pods also mentions scoring for nodes: https://github.com/kubernetes/community/blob/930ce65595a3f7ce1c49acfac711fee3a25f5670/contributors/design-proposals/scheduling/rescheduling-for-critical-pods.md

Feb 12 '17 10:02 hjacobs

This issue might get more attention in the light of stateful apps like PostgreSQL running on the cluster.

Nov 30 '17 21:11 hjacobs

I would not do it based on EBS volumes attached, but on node labels. This will give more flexibility. Maybe your DBs are running on the new bare metal hypervisors with local disks.

Dec 01 '17 09:12 szuecs

Along those lines: I provision temporary testing environments for automation of testing, when merging our code. The auto scaler allows us to not queue up testing and rather spin up as many environments as needed for each commit.

This includes many components. But our Cassandra DB and Flink are not as resilient and tests will fail if those pods get deleted. Once the test is over, the entire environment gets teared done.

I would somehow need to group critical pods together (Cassandra DB + Flink) on same nodes with same label. And delete the ones which get highest score, so delete only if nothing else can be deleted?

OR I need a way to NOT terminate an EC2 instance which is running a critical resource of my dynamic automated test environment. If a test environment exist, due to its ephemeral nature, it is definitely and actively being used for tests. Some of its components need protection.

Maybe a specific use case, but I'm hoping someone has some suggestions for me.

Apr 05 '18 04:04 Vince-Cercury

@VinceMD there is an EC2 feature to protect EC2 instances from downscaling. You might want to try this or you use PDBs to make sure only one instance at a time is "not ready", if this is good enough.

Apr 05 '18 07:04 szuecs

kube-aws-autoscaler kube-aws-autoscaler copied to clipboard

Scale down: implement scoring logic to select instance to terminate

kube-aws-autoscaler
kube-aws-autoscaler copied to clipboard