moped icon indicating copy to clipboard operation
moped copied to clipboard

If the configuration contains hosts that are down, performance is severely degraded

Open michaeldauria opened this issue 11 years ago • 13 comments

Moped 1.5.2

We are experiencing an issue where we have 3 hosts in our replica set and if we take one down, every Ruby process experiences a delay on every request. This applies to background jobs via Sidekiq as well. As soon as we take out the bad node from the configuration, everything goes back to normal.

michaeldauria avatar Feb 11 '14 15:02 michaeldauria

I am experiencing the same issue. I captured tcpdumps and can confirm that it's still trying to contact inactive replica sets. I've used moped 1.5.1 and 1.5.2 for debugging.

CITguy avatar Feb 20 '14 16:02 CITguy

Same problem here. We have set up a Replica Set with 4 hosts. As soon as a secondary is down (i.e. for maintenance) the application response time (not the response time of the db!) is going up tremendously (150ms to >> 2500ms). If all members are up and reachable, everything goes back to normal. We use mongoid-3.1.6 and moped-1.5.2 with Rails 3.2.17.

dankie avatar Mar 18 '14 12:03 dankie

Same problem here with the default settings:

timeout: 5
down_interval: 30

I gather that the following is happening:

  1. The node goes down.
  2. The connection to the node times out after 10s (twice the timeout), on the command database=admin command={:ismaster=>1}.
  3. Moped marks the node as down, and requests are fine again for a short while.
  4. After 30 seconds, the down_interval kicks in, and Moped tries to connect to the node again. This repeats the process from step 2.

This causes the process to be unresponsive for 10s every 30s. This behaviour is quite unexpected, since it is only a secondary that is down, and the system should be able to continue working without any significant degradation.

Currently my workaround is to change the settings to something like the following:

timeout: 1.5
down_interval: 120

This would cause the process to only be unresponsive for 3 seconds every 2 minutes, which is more acceptable.

@durran Is this the expected behaviour of Moped? Would the adjusted settings have any significant side-effects? Did the behaviour change in Moped 2.0, or would it still behave the same?

rkistner avatar Mar 18 '14 12:03 rkistner

Just adding my variation of this to the issue. I see the same kind of very significant degradation of performance even in this configuration scenario:

  • One secondary replica is down
  • mongoid.yml config is set to consistency: :strong (which should both read from and write to the primary if I read the Mongoid docs correctly) http://mongoid.org/en/mongoid/docs/installation.html#replica

The behaviour was unexpected to me and my colleagues. A clarification of the expected behaviour during failover scenarios would be appreciated.

eimermusic avatar May 07 '14 12:05 eimermusic

Is this going to make it into Milestone 2.0.0? I see that rc1 is in the pipeline and this seems like a really major flaw...

michaeldauria avatar May 09 '14 14:05 michaeldauria

We have just experienced this in production using Mongoid 3.1.2 / Moped v1.4.5. Can anyone confirm that this fix made it into Moped v2?

Dave

daveharris avatar Aug 24 '14 22:08 daveharris

We've also experienced this issue, seems like a fix for this didn't make it in.

thijsc avatar Sep 17 '14 08:09 thijsc

Just ran into a variation of this issue that is even worse. Some commands on MongoDB lock the entire database, for example the compact command (places the node in RECOVERING state). For this reason MongoDB recommends that this command is run only on secondaries. Unless you specifically read from secondaries, this should have no impact on production.

The issue now is that Moped connects to every node in the replicaset and issues an { ismaster: 1 } request, which also hangs until the compact command is complete. This causes the entire application to hang until the command is complete, which could take hours.

One of the main selling points of MongoDB is that it is able to handle nodes falling over, but with these issues that is not the case.

Note: this happened on Moped 1.5.0. I haven't tested newer versions yet, please tell me if it has been fixed since then.

rkistner avatar Oct 02 '14 15:10 rkistner

Moped 2.0 with Mongoid 4.0 have the same problem Just tested failover before release and was very surprised. Looking forward to the fix

volodymyr-mykhailyk avatar Oct 28 '14 16:10 volodymyr-mykhailyk

fixed a big memory leak on moped 2.0.1 related to nodes refresh, do you wanna give it a try on latest moped (2.0.2) and see if still have this performance issue. thanks

arthurnn avatar Nov 18 '14 20:11 arthurnn

Adding my 0.02. The issue still exists on mongoid 4.0.1 and moped 2.0.3. I noticed that if the machine is running (and accessible) but mongod is down, then the queries resolve as expected. It really seems like a connectivity timeout when trying to reach a dead node.

kuma-giyomu avatar Jan 29 '15 01:01 kuma-giyomu

Bump. Is this still an issue with the latest versions of Mongoid (version 5 or 6)? It's my understanding that Moped got folded into Mongoid?

hosh avatar Nov 22 '16 20:11 hosh

Mongoid now uses mongo-ruby-driver. See https://www.mongodb.com/blog/post/announcing-ruby-driver-20-rewrite.

dblock avatar Nov 22 '16 21:11 dblock