docker-nfs-server Add healthcheck

I think exportfs provides a fairly reliable way to check whether the container is healthy, so it seems to me make sense to add the healthcheck in the image - ie the healthcheck doesnt require any runtime vars

May 27 '20 08:05 phlax

LGTM. Thank you for your contribution! I'll merge momentarily.

Jul 21 '20 03:07 ehough

hi @ehough - the only thing i wasnt 100% sure about is the optimal settings for retry etc - these seemed reasonable defaults - but not sure

Jul 21 '20 08:07 phlax

i guess the issue with the grace period is significant here

Jul 21 '20 08:07 phlax

OK I think I spoke too soon. The problem is that exportfs's exit code seems like it's always 0. Run this:

docker run --rm alpine sh -c "apk add nfs-utils && exportfs; echo $?"

So exportfs's exit code would indicate to Docker that the container is healthy, even though there could be other problems with the server.

This image does a ton of error checking during boot up, so I think if entrypoint.sh is alive after, say 30 seconds, then we can somewhat safely assume that the container is healthy. Perhaps in the future we could add more sophisticated checks (i.e. ID mapping is requested, ensure that idmapd is running).

Would you be interested in updating this PR to instead check for entrypoint.sh? Something like pidof /usr/local/bin/entrypoint.sh might be enough. I'm happy to make the change but I don't want to clobber your contribution!

i guess the issue with the grace period is significant here

I haven't forgotten about #44. I was testing it this weekend and I kept crashing my laptop (it's running El Capitan, which has major issues with NFSv4). Stand by for updates there.

Jul 23 '20 02:07 ehough

The problem is that exportfs's exit code seems like it's always 0

wierd, i tested this when i made the PR and have been using it since. Not sure what i must have tested 8/ .

Would you be interested in updating this PR to instead check for entrypoint.sh?

sure

Jul 23 '20 06:07 phlax

after a little playing with this, i have come to the conclusion that this healthcheck should probs not be added to the base image, for a few reasons

checking the pid of the entrypoint doesnt tell you anything about the health of the nfs export - and im thinking that even if the entrypoint is hanging it would still return successfully with pidof. Also - unless i misunderstand - this would return healthy from the point the script was started, which is not what you want when waiting for services
probably the more reliable test is to nc test that any ports expected are alive - but these will depend on which nfs version and config etc
its generally only a good idea to add healthchecks to images (rather than in orchestration) if the check is reliable in any situation and wont degrade performance - im not sure that is the case here

i had a quick look at the checks in the entrypoint and my impression is that these are more reliable, but not sure how to indicate from script to env the health of exported dirs

also worth mentioning - https://www.ibm.com/support/pages/understanding-nfs-health-check-datapower - although it seems only tangentially helpful

Jul 23 '20 07:07 phlax

~~also - i might be wrong - but i think this does work after all (it seems to in my deployment)~~

try this:

docker run --rm alpine sh -c "apk add nfs-utils && echo $(exportfs); echo \"Received: $?\""

compared to:

docker run --rm alpine sh -c "apk add nfs-utils && echo $(exit 0); echo \"Received: $?\""

this was wrong - further testing shows it was expanding the $() in the host

Jul 23 '20 07:07 phlax

although not sure 8/

$ docker run -ti --rm alpine sh
/ # apk add nfs-utils && exportfs
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
(1/15) Installing krb5-conf (1.0-r1)
(2/15) Installing libcom_err (1.45.5-r0)
(3/15) Installing keyutils-libs (1.6.1-r0)
(4/15) Installing libverto (0.3.1-r1)
(5/15) Installing krb5-libs (1.17.1-r0)
(6/15) Installing libtirpc (1.1.4-r0)
(7/15) Installing rpcbind (1.2.5-r0)
Executing rpcbind-1.2.5-r0.pre-install
(8/15) Installing libblkid (2.34-r1)
(9/15) Installing libcap (2.27-r0)
(10/15) Installing device-mapper-libs (2.02.186-r0)
(11/15) Installing libevent (2.1.11-r0)
(12/15) Installing libmount (2.34-r1)
(13/15) Installing libnfsidmap (2.4.1-r2)
(14/15) Installing sqlite-libs (3.30.1-r2)
(15/15) Installing nfs-utils (2.4.1-r2)
Executing busybox-1.31.1-r9.trigger
OK: 12 MiB in 29 packages
/ # echo $?
0

Jul 23 '20 07:07 phlax

this seems to work, but not sure about performance or how reliable

/ # nfsstat -s
Error: No Server Stats (/proc/net/rpc/nfsd: No such file or directory). 
/ # echo "$?"
2

edit: i dont seem to be able to get this to work even with successfully exported dirs - i think because docker and /proc - either way it doesnt seem an option

Jul 23 '20 07:07 phlax

@ehough shall i close this?

i think its probs best to leave to individual setups etc

Jul 28 '20 19:07 phlax