moped icon indicating copy to clipboard operation
moped copied to clipboard

All connection attempts fail after the server's CNAME changes

Open jonasfa opened this issue 11 years ago • 8 comments

I've been having a lot of downtime because Moped caches servers' IP address.

MongoHQ changes their hosts' CNAME to recover from failures, which have been happening many times a month. When they do it, my application hangs until I manually restart all my application's instances/processes.

The error i get is: Nov 21 08:59:08 application-name app/web.2: MOPED: Could not connect to any node in replica set <Moped::Cluster nodes=[<Moped::Node resolved_address="11.11.111.111:10078">]>, refreshing list. The nodes list didn't change. The node's IP address did.

jonasfa avatar Nov 21 '13 18:11 jonasfa

AFAIK we resolve the names and store ips in memory, so if there is a DNS change I guess mongoid wont get that. Anyways I need to double check on code... I will let you know.. If thats the case we will need to revisit that and make sure that we use hosts instead.

arthurnn avatar Nov 22 '13 16:11 arthurnn

@arthurnn thanks for the response. You're right. I've found the IP address is cached by the Address class.

I'm gonna start working on a patch today. I'll probably remove the whole Address class, as its only function is to cache the resolved address. Any considerations before I start? I'm not familiarized with Moped's source code yet, but this shouldn't be a problem.

jonasfa avatar Nov 22 '13 18:11 jonasfa

@jonasfa If the DNS changes then you will need to restart your app. If you do not cache the resolved address then we would have to look it up on every request, and the call to Resolv.each_address is very expensive which would seriously harm application performance. This is why it was cached at all, since originally we did not but every resulting request became slow.

The only other option would be to do the address resolution async on another thread periodically over a configured interval - but that is much more complexity than what I think you were originally intending on.

durran avatar Dec 02 '13 16:12 durran

@durran Maybe I'm missing something, but Heroku Postgres + Rails ActiveRecord don't cache IP adresses and this isn't a performance problems for thousands of applications hosted there.

Why would Moped+MongoDB be different?

jonasfa avatar Dec 02 '13 16:12 jonasfa

With Moped V2, it experiences similar production outage scenarios, except it is when a node is brought online in the replica-set.

It appears that Resolv.each_address is being constantly called by Moped when the new node comes online. We have been forced to define the names and IP addresses of all MongoDB nodes in the /etc/hosts file in an attempt to improve the performance of Resolv.each_address.

After adding all the MongoDB nodes to the /etc/hosts file entries the Moped clients take 10 times longer to complete when a secondary node comes online in the replica-set. Only once the application has been restarted do they return to normal performance.

This means that every time a Mongo slave is brought online that all production application servers have to be restarted in order to maintain Moped/MongoDB performance.

Our application is running JRuby with over 100 threads active in every application instance. Is every thread doing calls to Resolv.each_address?

Could it be made configurable whether to use Resolv.each_address, so as to maintain application performance when nodes are brought online?

reidmorrison avatar Apr 21 '14 11:04 reidmorrison

Is this still an issue? Caching DNS with an infinite timeout is broken. A reasonable timeout would be something like 60-90 seconds.

I ask because I had to restart apps on like...50 servers today; three hours after I replaced a MongoDB instance. Well, all three, actually, but we set up DNS specifically to avoid issues like this.

There's the name service caching daemon (nscd), which speeds up DNS resolution on linux systems, if that's a concern.

gswallow avatar Mar 11 '16 18:03 gswallow

In 2014 we switched to MongoMapper that uses the standard Mongo Ruby driver, along with the mongo_ha gem and have not had any issues since. Feel free to close this ticket.

reidmorrison avatar Mar 11 '16 18:03 reidmorrison

Hello, this issue is still present. Having Mongo on AWS behind an Elastic Load Balancer (that rotates its public IP every hour) is really a pain. Could the IP be expired once in a configurable interval, at least?

aijanai avatar Sep 08 '16 11:09 aijanai