GearmaNode icon indicating copy to clipboard operation
GearmaNode copied to clipboard

Retry connection to gearmand server when connection lost

Open recunius opened this issue 9 years ago • 6 comments

When losing a connection to the Gearman server currently GearmaNode does not have an option to reestablish the connection.

I thought of implementing such a solution outside of GearmanNode (for a worker), but that would be difficult when load balancing between multiple servers. The only straightforward option would be to discard the Worker instance and create a new one, but in doing so either the old worker would remain connected to the other server(s) which remained up, or the old Worker could be closed in which case there would be a time without a Worker connection to any server.

So would you be open to including an optional connection retry mechanism inside of GearmaNode? I haven't thought much about the client connections, but for the worker connections I think all that would be needed is to

  1. listen to 'socketDisconnect'
  2. retry connecting with options for interval and retryCount
  3. once connected add the server back into the JobManager jobServer list
  4. re-register all known worker functions in Worker.functions

I could give it a shot, but does such a plan make sense or is anyone aware of a work around or pitfalls to consider first?

recunius avatar Mar 18 '15 19:03 recunius

@veny @recunius I'm also facing the same issue. Any updates on this?

hay-wire avatar May 05 '15 12:05 hay-wire

Did also just run into the same issue...

janober avatar Jun 02 '15 19:06 janober

hi all, I put this task on my roadmap. should be done in Juni.

veny avatar Jun 18 '15 21:06 veny

@veny What's status of this feature?

vatson avatar Nov 26 '15 17:11 vatson

Just got to find a solution for workers without having to change the library itself

var gearmanode = require('gearmanode');
var protocol = require('gearmanode/lib/gearmanode/protocol');

var failoverStrategy = function(server, added_functions) {
  server.clientOrWorker.once('socketDisconnect', function() {
    var retry = setInterval(function () {
      server.connect(function(e){
        if(!e) {
          //Let server know about functions that registered workers can do
          for (var i in added_functions) {
            server.send(protocol.encodePacket(protocol.PACKET_TYPES.CAN_DO, [added_functions[i]]));
          }
          server.send(protocol.encodePacket(protocol.PACKET_TYPES.PRE_SLEEP));

          // Add failoverStrategy to a next fail
          failoverStrategy(server, workers);
          clearInterval(retry);
        }
      });
    }, 1000);
  });
};

var servers = [];
var added_functions = [];
var worker = gearmanode.worker({servers: servers});

worker.addFunction('ping', function(job){
  job.workComplete('pong')
});
added_functions.push('ping');

for (var i in worker.jobServers) {
  failoverStrategy(worker.jobServers[i], added_functions);
}

Perhaps, it can be useful for someone until we don't have a built-in implementation

FYI @veny @recunius @hay-wire

vatson avatar Nov 27 '15 12:11 vatson

No updates for a while so was wondering if that still gets fixed or if the work around of vason should be used?

janober avatar Apr 05 '17 18:04 janober