moped Fix retries and failover

Pull-request that fixes failover and retry mechanism.

Changes in details:

Refactoring: move with_retry method to Cluster -- it belongs there as it operates on cluster.
Introduce retries on write operations -- it makes sense, because:
- Update is idempotent
- Delete is idempotent (deletes rows which are matching query)
- Insert -- in worse case scenario we could end up with duplicated data, however given that moped is used by mongoid, which inserts rows always with _id already present, therefore such duplicated insert will raise unique index on _id violation, which is fine.
Fixes failover mechanism -- Node#flush was was using ensure_connected, which involves failover, however processing of database messages after executing operations (and raising errors based on them) was outside of ensure_connected block, therefore failover mechanism wasn't exercised in most cases it was meant for.
Removes Reconfigure failover mechanism -- it was raising new exceptions but not retrying -- it should be good enough to just retry.
Refactoring: Move recognition mechanism for some errors from Errors class to Reply class, so errors recognition is in one place.
Fixes refresh mechanism -- if node was successfully refreshed it isn't down any more.

Outcome of those changes is that you can kill / restart mongo replica-set nodes in whatever order and as often as you like. You can even stop all of them for couple of seconds (driven by retry_count and retry_interval) and application will be able to recover without loosing any operations or throwing errors.

Sep 22 '14 14:09 dawid-sklodowski

Pushed this to our staging and it seems to work great with authentication failures / stepdowns etc. (and SSL enabled)!

Sep 23 '14 12:09 matsimitsu

Looks good to me. @arthurnn What do you think?

Sep 23 '14 14:09 durran

+1 Would really like to see one of the PRs that addresses failover pulled soon.

Oct 01 '14 22:10 zarqman

Found one more issue, if you have a replicaset and you want to re-sync a node (because of disk usage) and the node is in STARTUP2 mode, connection will fail with the following error:

2014-10-04T10:43:49.441Z 9291 TID-oulq07iok WARN: The operation: #<Moped::Protocol::Commands::Authenticate
  @length=167
  @request_id=54119
  @response_to=0
  @op_code=2004
  @flags=[]
  @full_collection_name="production.$cmd"
  @skip=0
  @limit=-1
  @selector={:authenticate=>1, :user=>"xx", :nonce=>"xx", :key=>"xx"}
  @fields=nil>
failed with error 18: "auth failed"

See https://github.com/mongodb/mongo/blob/master/docs/errors.md
for details about this error.

Steps taken:

shutdown mongodb on a node in a replicaset
remove mongodb data files
start mongodb
mongodb will now re-sync the data from another node in the state STARTUP2

It will keep on retrying to authenticate on this node causing constant failures.

Oct 04 '14 10:10 matsimitsu

+1 this sees like to fix the issue, too https://github.com/mongoid/moped/issues/268

Jan 08 '15 06:01 rakusai

+1 this works for me. Anybody using it in production?

Jan 21 '15 01:01 jperichon

@jperichon, we've been using it successfully in production for 3+ months. We added a couple of patches on top of it to fix up things it missed. Haven't seen any problems with the included commits though--they've been great.

Jan 21 '15 04:01 zarqman