rest-server
rest-server copied to clipboard
Add a /health endpoint which does not require authentication
Trying to deploy restic/rest-server:0.10.0 in k8s. I want a URL endpoint that I can use for "livenessProbe" (GET, should return 200).
As far as I can see from some testing, guided by mux.go - everything returns 401 Unauthorized. At the moment, I'm using a naive tcp 'liveness' probe, which is not nearly as expressive/accurate.
It would be nice if there was a /health endpoint (or similar) which did not require authentication, and returned 200 (assuming the server was ok).
I created a WIP pull request with a proposal for a /health endpoint. For now, the endpoint checks for free space (at least 8MB *) and that the repository path is writable. Should we check for more things?
Maybe we can add some other checks when/if external auth backends will be implemented (see #111 and #70 )
- I picked the value as the size on one pack, however I don't know if it's correct. Feel free to propose a different value
Which checks do we really need for a /health endpoint? Checking whether there is enough free space to store at least one pack file is mostly of academic interest. After all it will only take a very short time to fill up the free space completely at which point the server will stop accepting new uploads. And I don't think that there's a right answer to what the limit should be, probably every admin will want to use different limits.
If I understand liveness probes correctly, they are used to restart stuck containers. That is a failed health probe would cause container restarts. However, restarting the rest-server container once a disk has run full is highly problematic as that would prevent (read) access to the backup repositories.
And I guess a similar reasoning applies to whether the repository path is writeable (although that might be less of a problem).
Agreed, the focus should be on whether killing+restarting this container would help. This isn't a substitute for more comprehensive monitoring.
A quite reasonable first step is to do no extra logic and just respond with 200 ok immediately, from your main http event handler. Even that trivial check still confirms that the program is running, is listening on the correct port, has completed any startup steps, isn't in deadlock or oom-thrash, etc.
Just for completeness, don't make 'healthiness' depend on reachability/health of some other remote service. This is a common error and leads to cascading failures.
It would make sense to just add a handler for /health that always returns 200.
I can think of the following additional things to check:
- Check if the
.htpasswdfile is readable if htpasswd auth is enabled. - Check if the repo root directory exists.
Restarting rest-server will not resolve any of these, but I can imagine that the failure state of the container/pod is useful to administrators. But I think that rest-server will actually fail to start in those cases anyway, in which case adding these checks is not that useful.
As discussed, free disk space is something that can and should be monitored outside of rest-server.
Perhaps we could add a Prometheus metric for write errors?
As a workaround, you could set the -prometheus-no-auth flag to disable auth on the /metrics endpoint, if you do not mind exposing the metrics or have a reverse proxy that can restrict access to that path in front of the service.
I would love to have this for checking if my rest-server up before triggering remote backup with systemd.
Right now I am using this for check:
curl --silent --fail --head -L http://192.168.1.100:8000/myrestic-backup-repo-name/config