fluent-plugin-mongo
fluent-plugin-mongo copied to clipboard
failed to flush the buffer - Mongo_replset does not reconnect to replica set
We're using td-agent 2.3.1 and we are getting the following error messages from time to time.
After the error message the output plugin out_mongo_replset
does not reconnect to the replica set and stays in this stuck state until it's restarted. No new messages are saved in MongoDB.
td-agent[18558]: 2016-09-24 22:02:21 +0000 [warn]: Failed to connect to Replica Set. Try to retry: retry number = 1
td-agent[18558]: 2016-09-24 22:02:21 +0000 [warn]: retry succeeded. plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:09:52 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-09-24 22:09:33 +0000 error_class="NameError" error="uninitialized constant IRB::Abort" plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:09:52 +0000 [warn]: suppressed same stacktrace
td-agent[18558]: 2016-09-24 22:10:20 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-09-24 22:09:35 +0000 error_class="NameError" error="uninitialized constant IRB::Abort" plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:10:20 +0000 [warn]: suppressed same stacktrace
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-09-24 22:09:39 +0000 error_class="Mongo::OperationTimeout" error="Timed out waiting on socket read." plugin_id="object:153dc74"
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:336:in `rescue in receive_message_on_socket'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:330:in `receive_message_on_socket'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:191:in `receive_header'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/networking.rb:182:in `receive'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/authentication.rb:446:in `auth_command'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/authentication.rb:409:in `issue_scram'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/authentication.rb:225:in `issue_authentication'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:321:in `block in check_auths'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/set.rb:263:in `each_key'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/2.1.0/set.rb:263:in `each'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:320:in `check_auths'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:285:in `block (2 levels) in checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:267:in `synchronize'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:267:in `block in checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:260:in `loop'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/connection/pool.rb:260:in `checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:413:in `get_socket_from_pool'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:371:in `block in checkout_reader'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:354:in `checkout'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/mongo_replica_set_client.rb:369:in `checkout_reader'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:623:in `checkout_socket_from_connection'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:550:in `block in send_initial_query'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:55:in `block in instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:20:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:54:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:547:in `send_initial_query'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:532:in `refresh'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/cursor.rb:139:in `next'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/db.rb:607:in `command'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:353:in `block in batch_message_send'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:55:in `block in instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:20:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/functional/logging.rb:54:in `instrument'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:352:in `batch_message_send'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:82:in `block in batch_write_incremental'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:57:in `catch'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection_writer.rb:57:in `batch_write_incremental'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection.rb:1184:in `batch_write'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/mongo-1.12.5/lib/mongo/collection.rb:411:in `insert'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo.rb:154:in `operate'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo_replset.rb:45:in `block in operate'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo_replset.rb:61:in `rescue_connection_failure'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo_replset.rb:44:in `operate'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-mongo-0.7.12/lib/fluent/plugin/out_mongo.rb:133:in `write'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:345:in `write_chunk'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:324:in `pop'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:329:in `try_flush'
td-agent[18558]: 2016-09-24 22:10:40 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:140:in `run'
systemd[1]: Stopping td-agent...
Please paste your configuration here.
I checked mongo driver and it seems to delete closed socket: https://github.com/mongodb/mongo-ruby-driver/blob/31c59b900dc8223ab97e4019d8fafb8f197754f1/lib/mongo/connection/pool.rb#L294 Could you give me the result of sigdump?
http://docs.fluentd.org/articles/trouble-shooting#dump-fluentd-internal-information
I want to know which function call causes this problem.
I suspect the issue may be connected to MongoDB changing primary since during the election there's no primary and logs cannot be flushed.
I have already restarted td-agent, will the sigdump help or should I wait until the issue appears again?
Config
<source>
type forward
port 24224
bind 127.0.0.1
@label @docker
</source>
<label @docker>
<filter docker.*>
type record_transformer
remove_keys app_id,container_name
</filter>
<match docker.*>
type mongo_replset
tag_mapped
remove_tag_prefix docker.
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database logs
capped
capped_size 2097152
</match>
<match ssl.*>
type mongo_replset
tag_mapped
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database logs
capped
capped_size 524288
</match>
<match **>
type mongo_replset
tag_mapped
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database logs
capped
capped_size 2097152
</match>
</label>
<label @accesslogs>
<filter **>
type record_transformer
renew_record true
<record>
tag ${tag_parts[3]}
httpStatusCode ${code}
upstreamTime ${upstream_time}
</record>
</filter>
<match **>
type mongo_replset
flush_interval 10s
nodes ip1:27017,ip2:27017
name rs0
user user1
password xxx
database access_logs
collection http_status_queue
</match>
</label>
will the sigdump help or should I wait until the issue appears again?
Yeah. If the problem happens again, sigdump result helps problem investigation.
I did a couple sigdump's, since I wasn't sure if and where it was saved...
Is there any other info I can provide? We've experienced this issue pretty often recently.
Hmm... hard to debug because it seems the problem happens inside mongo driver, not plugin. Could you update fluent-plugin-mongo to v0.8.0? It uses mongo driver v2.x.
Sure, I can give it a try. Any hints how to update td-agent to use the v0.8.0?
See: http://docs.fluentd.org/v0.12/articles/plugin-management
I have just deployed the new plugin to one of our nodes. So far it seems that fluentd starts and works ok.
However I didn't find a way to specify multiple hosts for a replica set. There's only one host
config var available. How can I specify multiple hosts as with the old nodes
config var?
It is the regression of updating mongo driver to v2.x. I discussed this problem on https://github.com/fluent/fluent-plugin-mongo/issues/88. Maybe, @megamk will send a patch.
I can't get a response from megamk, so I wrote a patch for this. v0.8.1 has nodes parameter for replica set. Or you can use connection_string with Mongo URI.
I can't get a response from megamk, so I wrote a patch for this.
I'm sorry, my ruby knowledge is not enough to dive into how your system tests work so I was planning to do this in my spare time before making a PR. But I couldn't find time for that yet, sorry.
Great that you've adopted my suggestion though, thanks!