memcached icon indicating copy to clipboard operation
memcached copied to clipboard

Issues with ping-based reconnect

Open egirshov opened this issue 11 years ago • 3 comments

I bumped into an issue with reconnecting to failed memcached instance. While trying to reproduce I found out that the ping check performed in lib/connection.js might keep spawned process for a long time (indefinitely?) if the host to be checked is not reachable. I would expect it to kill the subprocess after certain timeout and retry later if needed.

There are two more issues with how the data received from ping are interpreted:

  • it relies on stdout to mean 'everything is fine' and stderr to mean 'some failure', although for example Linux (Debian) version prints to stdout whether it is 0% or 100% packet loss (the failure is indicated by process exit code),
  • once ping check is executed, 'reconnected' event is emitted, whereas we don't really know yet whether memcached is up and running on that host (I suppose #99 is addressing this point).

egirshov avatar Dec 16 '13 13:12 egirshov

I can confirm that we're running into the same thing. Even after the app is terminated the ping process is still there.

raykrueger avatar Apr 18 '14 15:04 raykrueger

Me Too. When Memcached Server crashed, many ping processes started and hangup, then node server crashed.

leonhl avatar May 15 '14 09:05 leonhl

This happened to us as well. We lost one of two memcached servers and two of our 26 node servers hung up. We couldn't restart the node processes because the address was already bound to the port the server is supposed to listen on. With lsof, I discovered that ping was listening on the port our server is supposed to listen on.

This means that, not only is the ping process not cleaned up, but it is spawned in such a way that it does not close inherited file descriptors, AND shutting down node doesn't clean up the process.

eraserhd avatar Nov 17 '14 17:11 eraserhd