ol-infrastructure icon indicating copy to clipboard operation
ol-infrastructure copied to clipboard

Need to know when containers outside the healthcheck path go down

Open Ardiea opened this issue 1 year ago • 0 comments

We don't always know if a docker container that isn't in the 'critical path' has stopped running. For instance, if a traefik or application container crashes, the aws lb will notice that things have stopped responding and it iwll eventuall kill the instance and bring in new ones. BUT, if a celery container crashes, because it isn't part of the healthcheck in any way, the service could end up offline. We need some kind of alert for when this happens. If this ends up being a thing that happens a lot, then we should automate a mitigation and / or find root causes for this troubling behavior.

Ardiea avatar Mar 31 '23 16:03 Ardiea