ol-infrastructure icon indicating copy to clipboard operation
ol-infrastructure copied to clipboard

forum Healthcheck isn't reliable

Open Ardiea opened this issue 7 months ago • 0 comments

Expected Behavior

If forum can't talk to its mongodb or opensearch backends, the app should crash / stop outright. Not enter a funky state where the ASG / LB healthcheck passes but the app itself isn't working.

Current Behavior

If forum can't find it's mongodb or opensearch instances for 10 minutes, it just stops looking for them and enters a catatonic state where it is still 'running' good enough for the LB healthchecks to pass but it isn't really working because it won't answer any requests, and the container is possibly stopped / not listening.

Possible Solution

Put traefik infront of the container to create a healthcheck endpoint that works? Figure out the behavior of forum and adjust the healthcheck status matcher appropriately.

Additional Details

Discussion starting here and going to about 4pm that day. https://mitodl.slack.com/archives/C02QLTAE05S/p1721329113019089

Ardiea avatar Jul 18 '24 19:07 Ardiea