mup-aws-beanstalk
mup-aws-beanstalk copied to clipboard
Graceful shutdown SSM commands get stuck on terminated instances
Since graceful shutdown runs SSM commands on all instances matching a certain tag, even instances which are "Terminated" will match that tag. Ultimately, trying to run the command on a terminated instances results in a timeout (AWS SSM default is 3600 seconds // 1 hour).
See this thread for others experiencing the issue with SSM on terminated instances.
This can cause a blockage of graceful shutdown commands.
Since these commands usually take 3-4 seconds to run, I think it might be worth putting a timeout on the SSM Document like:
{
...
"mainSteps": [
{
"name": "runCommand",
"action": "aws:runCommand",
"timeoutSeconds": 10,
"inputs": ...
}
]
}
I'll test it out to confirm, but let me know if you think that makes sense.
Confirmed that this resolves the issue. It's not perfect because you lose a few extra seconds while waiting for the timeout, and that time cuts into the graceful shutdown window.
I think ultimately the graceful shutdown setup needs a way to either: A. Filter out terminated instances on the Cloudwatch event rule B. Fail immediately when SSM runs against a terminated instance
I'm sure there are other options though. Don't want to open a PR for this as I feel my "timeoutSeconds" fix is a band-aid rather than real fix.
I've added the timeout in 0.7.0 (#142), but will leave this issue open until we find a better solution.