memcached Issues with ping-based reconnect

I bumped into an issue with reconnecting to failed memcached instance. While trying to reproduce I found out that the ping check performed in lib/connection.js might keep spawned process for a long time (indefinitely?) if the host to be checked is not reachable. I would expect it to kill the subprocess after certain timeout and retry later if needed.

There are two more issues with how the data received from ping are interpreted:

it relies on stdout to mean 'everything is fine' and stderr to mean 'some failure', although for example Linux (Debian) version prints to stdout whether it is 0% or 100% packet loss (the failure is indicated by process exit code),
once ping check is executed, 'reconnected' event is emitted, whereas we don't really know yet whether memcached is up and running on that host (I suppose #99 is addressing this point).

Dec 16 '13 13:12 egirshov

I can confirm that we're running into the same thing. Even after the app is terminated the ping process is still there.

Apr 18 '14 15:04 raykrueger

Me Too. When Memcached Server crashed, many ping processes started and hangup, then node server crashed.

May 15 '14 09:05 leonhl

This happened to us as well. We lost one of two memcached servers and two of our 26 node servers hung up. We couldn't restart the node processes because the address was already bound to the port the server is supposed to listen on. With lsof, I discovered that ping was listening on the port our server is supposed to listen on.

This means that, not only is the ping process not cleaned up, but it is spawned in such a way that it does not close inherited file descriptors, AND shutting down node doesn't clean up the process.

Nov 17 '14 17:11 eraserhd

memcached memcached copied to clipboard

Issues with ping-based reconnect

memcached
memcached copied to clipboard