Set up a /healthcheck endpoint that can be monitored

Open edeutsch opened this issue 2 years ago • 1 comments

Based on today's AHM discussion: It would nice to have something like a /healthcheck endpoint that could report substantial problems and could be monitored.

So for example,

When /healthcheck is called, it could make sure that the KP info cache is less than 30 minutes old
There aren't a bunch of stale active processes
Relay any major errors*

It can start small, but ideally be flexible so that we could add more health checks, too.

Footnote* I have often mused about somehow have a response.error() option that is something like tell_a_human=True or something, where not only did the processing end in error, but this condition really ought to be relayed to an administrator rather than buried in a log file that no one is likely to read.

Oct 04 '23 21:10 edeutsch

This would be nice for the ITRB endpoints in particular, where we can't even log in to poke around.

Oct 22 '23 17:10 saramsey