elastic-ci-stack-for-aws icon indicating copy to clipboard operation
elastic-ci-stack-for-aws copied to clipboard

Parameterise ASG Termination Policies

Open mf-lit opened this issue 1 year ago • 1 comments

This change adds a Parameter for setting the ASG Termination Policies, with defaults to keep it backward-compatible.

FWIW, our particular use case for this is so we can proritise the NewestInstance policy, meaning that we keep instances for longer in order to maintain a warm docker cache (our queue isn't busy enough to justify a dedicated queue for this).

mf-lit avatar Apr 21 '23 10:04 mf-lit

Hi @mf-lit. Thanks for making this PR. Unfortunately, we think it's not likely to work as expected.

So the way the instances in the elastic stack are terminated is not controlled by the ASG, because the ASG has NewInstancesProtectedFromScaleIn set to true. The instances terminate themselves when their agents are no longer running jobs (and ScaleInIdlePeriod has passed).

Furthermore, the configuration of buildkite-agent-scaler does not decrease the ASG's desired capacity. It's only the instances terminating themselves that decrease the desired capacity.

With that in mind, we don't think the termination policies will be used, as the scale in pathway does not involve the ASG evaluating them.

The reason for this admittedly convoluted situation is that we don't want instance to be terminated while they are running jobs. If that's not something you are concerned with, we can consider a mode of operation where the terminations policies will have an effect, but you will likely need to make significant changes to achieve that.

triarius avatar Aug 30 '23 02:08 triarius