lifecycle-manager icon indicating copy to clipboard operation
lifecycle-manager copied to clipboard

Hooks to run scripts at different stages

Open challapradyumna opened this issue 3 years ago • 5 comments

FEATURE REQUEST:

What happened: Draining & removing nodes from ASG also requires us to update the monitoring systems to stop monitoring the node and a few other things before the node can be safely removed from circulation

What you expected to happen: If there were flags where shell scripts could be tagged along with lifecycle manager at different stages that would make lifecycle-manager extensible for various use cases.

challapradyumna avatar Dec 11 '20 08:12 challapradyumna

Hi @challapradyumna Great idea and interesting use-case. Do you need to run a script on the controller, or on the terminating node?

A simple script implementation might be a bit problematic since this is a service and not a controller, and there is no custom resource - so the only interface are flags passed in, also, allowing arbitrary script execution might have security implications.

I think one possible approach to have something secure and configurable that answers this use case is to use SSM send-command.

User can then specify a specific pre-created script to execute via flag e.g. --ssm-finalize-script which would invoke the script and wait for completion.

This would require users to integrate SSM on their AMIs, but would be easy to implement and relatively more secure, since the script is pre-created.

WDYT?

eytan-avisror avatar Dec 11 '20 19:12 eytan-avisror

At least for our use-case, it's about muting the instance on datadog, other third-party services nothing on the instance.

Yes and No with the SSM integration makes sense to do it but becomes a pre-requisite for anyone to use this feature.

I'm thinking more on the lines of calling a webhook or kicking off a job inside the cluster itself sending the instance details as a parameter.

challapradyumna avatar Dec 11 '20 20:12 challapradyumna

Are you referring to running this webhook inside the lifecycle-manager pod, or from the terminating instance? Is this supposed to be blocking for the instance termination? Is it 'best-effort' attempt, or do you need to validate the call response?

eytan-avisror avatar Dec 11 '20 20:12 eytan-avisror

One other option would be to label the node (or add some annotation) when it is about to be drained and terminated. Most Kubernetes aware tools allow for configuration based on that.

Would that help?

shrinandj avatar Dec 11 '20 21:12 shrinandj

I'm looking more like a flag e.g: --post-drain project/mute-instance. This would run the container mentioned as a job. I got the issue with shell scripts it becomes too much of a hassle to maintain that in the long run.

challapradyumna avatar Jan 06 '21 09:01 challapradyumna