log-ship-elastic-postfix
log-ship-elastic-postfix copied to clipboard
logship gets stuck if started when elasticsearch is down
Hello,
We got to situation when we restart machine and service with log-ship-elastic-postfix starts before elasticsearch is available.
In this case function this.elastic.ping(function (err)
( lib/logship.js line: 38) just returns printing:
{ Error: No Living connections
at sendReqWithConnection (/media/sf_solegate_core/libs/mail_tracking/node_modules/elasticsearch/src/lib/transport.js:207:15)
at next (/media/sf_solegate_core/libs/mail_tracking/node_modules/elasticsearch/src/lib/connection_pool.js:213:7)
at _combinedTickCallback (internal/process/next_tick.js:67:7)
at process._tickCallback (internal/process/next_tick.js:98:9) message: 'No Living connections' }
and reader is not started.
Process is idle for 6 hours, till watchdog()
kills it.
We have external service monitoring tool(monit) that monitors all our services and restarts them if needed.
The desired behavior in this case would be just shutdown the process(or some retry mechanism that can be found in PostfixToElastic.prototype.doneQueue
).
Simple solution is to put p2e.shutdown();
before return in function this.elastic.ping
If this behavior is intended and there is another way to discover that nothing is running, will be glad to hear.
Thank you, Constantine
You really don't want to shutdown immediately, else you'll be spawning new node.js processes in a furious tail-chasing exercise. I solve this problem by making the ES cluster redundant so it was always up and then ignored this particular issue.
I think a good solution is to wrap that ping function in a simple version of async.until(), track the number of connection attempts, and then retry after attempts * 10 seconds.