synapse icon indicating copy to clipboard operation
synapse copied to clipboard

Make healthcheck check a little bit more

Open ShadowJonathan opened this issue 4 years ago • 1 comments

Currently /health looks like this;

    def render_GET(self, request: Request) -> bytes:
        request.setHeader(b"Content-Type", b"text/plain")
        return b"OK"

Which is functionally equivalent to calling /versions on the endpoint.

I think that this should do a little more than just blindly respond with 'everything is fine', giving me a similar feeling to the following meme;

image


Jokes aside, i think that this endpoint should perform or otherwise "check up" on some basic functionality, or otherwise return "not OK" (with 5XX) when some precondition isn't present (which could be defined from other resources).

Maybe this could be linked to an "error counter", which would count the last amount of exceptions in the last minute, and this health resource should then return "not OK" if it passes a threshold.

Other than that, this is open for further ideas.

ShadowJonathan avatar Dec 01 '21 12:12 ShadowJonathan

+1 for this. I've noticed my client reports that it cannot connect to Synapse but the healthcheck continues to 200 OK. This typically happens in my case after a Postgres restart. Restarting Synapse fixes the problem but it would be nice if the healthcheck appropriately reported the health of the process so infra automation can remediate.

philipcristiano avatar Oct 25 '23 14:10 philipcristiano