serratus icon indicating copy to clipboard operation
serratus copied to clipboard

Set-up "Instance health checks" with graceful shut-down

Open ababaian opened this issue 5 years ago • 0 comments

There are edge-cases of instance errors in which say the serratus-align container is not doing any meaningful work (measured by CPU%) and the shut-down procedures fail to catch and gracefully close the instance and container. We rely on ec2-terminate for this graceful shutdown but having a redundancy of sudo shutdown -h now or eqiuvalent function would be really nice.

One way to implement this is to add "health checks" for the instances, that is if CPU usage i say <5% for a sustained 5-10 minutes, the instance is terminated from outside. There are quite a few cases of serratus-align, serratus-dl and serratus-merge in which a few stragglers are left 'spooling' after scale-in or in the background during a run. This in theory will be a catch-all for several errors to reduce inefficiencies.

From serratus/containers/worker.sh

          shutdown)
            (
                flock 200

                echo "  Shutting down instance"
                # TODO: change to shutdown (see below)
                aws ec2 terminate-instances \
                 --region us-east-1 \
                 --instance-ids $INSTANCE_ID

                sleep 300

                # TODO: Add a redundancy for shutdown
                #       to work form inside the container
                #
                # Secondary back-up -- shutdown instance
                # (set to "stopped" state" if terminate fails)
                # yum -y install sudo shadow-utils util-linux
                # sudo shutdown -h now
                # sleep 300
                
                false
                exit 0

            ) 200> "$BASEDIR/.shutdown-lock"

ababaian avatar Jun 07 '20 20:06 ababaian