mup-aws-beanstalk icon indicating copy to clipboard operation
mup-aws-beanstalk copied to clipboard

Graceful shutdown SSM commands get stuck on terminated instances

Open etyp opened this issue 5 years ago • 2 comments

Since graceful shutdown runs SSM commands on all instances matching a certain tag, even instances which are "Terminated" will match that tag. Ultimately, trying to run the command on a terminated instances results in a timeout (AWS SSM default is 3600 seconds // 1 hour).

See this thread for others experiencing the issue with SSM on terminated instances.

This can cause a blockage of graceful shutdown commands.

Since these commands usually take 3-4 seconds to run, I think it might be worth putting a timeout on the SSM Document like:

{
  ...
  "mainSteps": [
    {
      "name": "runCommand",
      "action": "aws:runCommand",
      "timeoutSeconds": 10,
      "inputs": ...
    }
  ]
}

I'll test it out to confirm, but let me know if you think that makes sense.

etyp avatar Jul 23 '19 06:07 etyp

Confirmed that this resolves the issue. It's not perfect because you lose a few extra seconds while waiting for the timeout, and that time cuts into the graceful shutdown window.

I think ultimately the graceful shutdown setup needs a way to either: A. Filter out terminated instances on the Cloudwatch event rule B. Fail immediately when SSM runs against a terminated instance

I'm sure there are other options though. Don't want to open a PR for this as I feel my "timeoutSeconds" fix is a band-aid rather than real fix.

etyp avatar Jul 24 '19 02:07 etyp

I've added the timeout in 0.7.0 (#142), but will leave this issue open until we find a better solution.

zodern avatar Sep 30 '21 17:09 zodern