log-ship-elastic-postfix icon indicating copy to clipboard operation
log-ship-elastic-postfix copied to clipboard

logship gets stuck if started when elasticsearch is down

Open crazy-man opened this issue 8 years ago • 1 comments

Hello, We got to situation when we restart machine and service with log-ship-elastic-postfix starts before elasticsearch is available. In this case function this.elastic.ping(function (err) ( lib/logship.js line: 38) just returns printing:

{ Error: No Living connections
    at sendReqWithConnection (/media/sf_solegate_core/libs/mail_tracking/node_modules/elasticsearch/src/lib/transport.js:207:15)
    at next (/media/sf_solegate_core/libs/mail_tracking/node_modules/elasticsearch/src/lib/connection_pool.js:213:7)
    at _combinedTickCallback (internal/process/next_tick.js:67:7)
    at process._tickCallback (internal/process/next_tick.js:98:9) message: 'No Living connections' }

and reader is not started. Process is idle for 6 hours, till watchdog() kills it. We have external service monitoring tool(monit) that monitors all our services and restarts them if needed. The desired behavior in this case would be just shutdown the process(or some retry mechanism that can be found in PostfixToElastic.prototype.doneQueue). Simple solution is to put p2e.shutdown(); before return in function this.elastic.ping If this behavior is intended and there is another way to discover that nothing is running, will be glad to hear.

Thank you, Constantine

crazy-man avatar Nov 20 '16 14:11 crazy-man

You really don't want to shutdown immediately, else you'll be spawning new node.js processes in a furious tail-chasing exercise. I solve this problem by making the ES cluster redundant so it was always up and then ignored this particular issue.

I think a good solution is to wrap that ping function in a simple version of async.until(), track the number of connection attempts, and then retry after attempts * 10 seconds.

msimerson avatar Nov 20 '16 17:11 msimerson