moped Stuck in Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set

Stuck in Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set

Open dblock opened this issue 9 years ago • 29 comments

Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set #<Moped::Cluster:128953180 @seeds=[<Moped::Node resolved_address="10.95.128.244:27017">, <Moped::Node resolved_address="10.184.156.102:27017">]>

…avity/ruby/2.0.0/gems/moped-2.0.3/lib/moped/cluster.rb: 254:in `with_primary'
…ty/ruby/2.0.0/gems/moped-2.0.3/lib/moped/collection.rb: 124:in `insert'
…by/2.0.0/gems/mongoid-4.0.0/lib/mongoid/query_cache.rb: 117:in `insert_with_clear_cache'
…ems/mongoid-4.0.0/lib/mongoid/persistable/creatable.rb:  79:in `insert_as_root'

Occasionally we see a machine or two stuck in this. I am not sure when this happens, but about 10% of nodes end up in this state every 24 hours. The MongoDB cluster is doing fine.

This issue could probably use more detail, please tell me what to look for next time I have a machine in this state.

Feb 08 '15 12:02 dblock

Hi @dblock could you check if the code on #352 solve this problem?

Feb 13 '15 23:02 wandenberg

@dblock Please see my PR #338 We were having these errors too, and I'm guessing that you are actually having a pool saturation problem and not primary node connection issues. In general, the logging in mongoid is pretty terrible. Are you running Puma? And have you tuned pool_size and pool_timeout?

Feb 20 '15 15:02 niedfelj

How do you go about tuning those? How do you know what to set them to? Are there guidelines?

Feb 20 '15 16:02 steve-rodriguez

In general, you should have a pool_size that is equal to or greater than the number threads you are running. You shouldn't need to tune pool_timeout. Here is an update submitted to mongoid for generating the mongoid.yml giving more details on those configs

https://github.com/mongoid/mongoid/pull/3883/files

Feb 20 '15 16:02 niedfelj

These PRs might also be useful to you, in adding more/better logging in error situations and giving metrics on per request in rails:

https://github.com/mongoid/mongoid/pull/3885 https://github.com/mongoid/mongoid/pull/3884

Feb 20 '15 16:02 niedfelj

#352 has so far been good to us in production (72 hours). So it has improved things I want to say.

Feb 20 '15 17:02 dblock

I'm seeing this error as well.

Feb 23 '15 16:02 fedenusy

+1. We see this a couple of times per day, seemingly on a random basis.

Mar 08 '15 03:03 ajsharp

+1 also seeing this.

Mar 10 '15 09:03 wnkz

I think, MOPED also use wrong thread-safe code.

https://github.com/mongoid/moped/pull/353#issuecomment-79422271

Mar 13 '15 21:03 InvisibleMan

Interesting. Does anyone see this behavior with unicorn? I've seen it with puma (threads), but don't have anything in production with unicorn.

Wondering if switching the app server to unicorn might be an easy "fix", because it seems like the real fix could take a bit of time.

Mar 13 '15 21:03 ajsharp

@arthurnn any thoughts on this issue?

Mar 13 '15 22:03 ajsharp

I'm using sidekiq gem and I have not choice.

Mar 16 '15 09:03 InvisibleMan

I just spent 20 minutes debugging an issue with this error message, and I found that when calling .find(nil) in moped it results in this (incorrect) error message.

> session[:test].find(nil).first
Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set #<Moped::Cluster:69729780 @seeds=[<Moped::Node resolved_address="127.0.0.1:27017">]>

Whereas without arguments it's ok:

> session[:test].find().first
=> nil

Expected error message would be something along InvalidFind

Mar 21 '15 14:03 glebtv

Jun 06 '15 14:06 sahin

who still having this problem and can help me with the setup environment and a description on how to reproduce it?

Jun 06 '15 19:06 wandenberg

+1 @wandenberg , we still have this problem in production. It is simple to reproduce it, shutdown one of the server in the replication or close the port.

Jun 07 '15 13:06 sahin

+1. Monkey increasing POOL_SIZE seems to give more time between errors. Also, looks like sidekiq is playing a major role. I got 90 sidekiq workers in 3 servers, plus 10 or so unicorns. Still don't get the pool size 5...

Jun 07 '15 23:06 nofxx

Jul 01 '15 16:07 davidleroy

:+1:

Jul 01 '15 16:07 brand-it

We're seeing this error crop up in some sidekiq jobs.

Oct 09 '15 17:10 mhuggins

+1, still see the issue

Moped::Errors::ConnectionFailure

Could not connect to a primary node for replica set #<Moped::Cluster:50526920 @seeds=[<Moped::Node resolved_address="10.23.84.206:27018">, <Moped::Node resolved_address="10.23.84.207:27018">]>

traceback

vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/cluster.rb:254:in `with_primary'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/read_preference/primary.rb:55:in `block in with_node'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/retryable.rb:30:in `call'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/retryable.rb:30:in `with_retry'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/retryable.rb:39:in `rescue in with_retry'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/retryable.rb:29:in `with_retry'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/retryable.rb:39:in `rescue in with_retry'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/retryable.rb:29:in `with_retry'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/read_preference/primary.rb:54:in `with_node'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/cursor.rb:139:in `load_docs'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/query_cache.rb:234:in `block in load_docs'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/query_cache.rb:135:in `with_cache'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/query_cache.rb:234:in `load_docs'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/cursor.rb:28:in `each'
vendor/bundle/ruby/gems/moped-2.0.7/lib/moped/query.rb:78:in `each'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/contextual/mongo.rb:122:in `each'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/contextual.rb:20:in `each'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/criteria/findable.rb:107:in `entries'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/criteria/findable.rb:107:in `from_database'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/criteria/findable.rb:75:in `multiple_from_db'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/criteria/findable.rb:19:in `execute_or_raise'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/criteria/findable.rb:40:in `find'
vendor/bundle/ruby/gems/mongoid-4.0.2/lib/mongoid/findable.rb:90:in `find'
....
....

Jul 26 '16 10:07 chenqiangzhishen

How do you properly set up a moped pool if not using mongoid? Here is how I'm doing it, and still occasionally getting these errors:

$mongo_pool = ConnectionPool.new(size: 30, timeout: 3000) do
  mongo_client = Moped::Session.new(Moped::Uri.new(uri_string).hosts)
  mongo_client.use(dbname)
end

# have one main one open
mongo_client = Moped::Session.new(Moped::Uri.new(uri_string).hosts)
$mongo = mongo_client.use(dbname)

where uri_string is in the format: mongodb://1.2.3.4:27017/desired_db_name

Might end up just dropping moped as I'm not even using mongoid and that seems to be the biggest use/support case :/

Nov 30 '16 16:11 dennislysenko

It could be that mongo is not running. Have you tried:

sudo rm /var/lib/mongodb/mongod.lock
sudo service mongodb start

Mar 29 '17 13:03 elenatanasoiu

@elenatanasoiu the problem is that mongod is running and replica set is healthy but these error messages crop up nevertheless

Apr 18 '17 19:04 deepthawtz

Hey, did you find any solution ?

Aug 01 '17 09:08 bastoune

@bastoune we used sidekiq for background jobs and puma. Both being multithreaded, supporting 25 and 16 threads by default. Now, mongoid by default has pool size as 5, evidently, there were situations wherein the poolsize got exhausted in this case resulting into
Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set #<Moped::Cluster:1223353180 @seeds=[<Moped::Node resolved_address="xx.xxx.xxx.xxx:27017">, <Moped::Node resolved_address="xx.xxx.xxx.xxx:27017">]>

Fixed it by tuning poolsize, sidekiq + puma threads. Here is an article for sql database though i suppose it clarifies the fundamentals

Sep 11 '17 18:09 shivamv

@shivamv Thanks for the reply, going to spend more time to understands this ;)

Sep 28 '17 14:09 bastoune

@shivamv thanks, it help full, maybe somebody miss turn on docker have mongoid inside?

Aug 27 '18 03:08 yanghoxom

moped moped copied to clipboard

Stuck in Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set

moped
moped copied to clipboard