moped
moped copied to clipboard
Hangs forever in refresh after 5 minutes when 2ary unreachable
tl;dr setsockopt(SO_RCVTIMEO) doesn't work for us in Ruby 2.[01] MRI, infinite hangs result
We're having trouble in our Mongo cluster when a secondary becomes unreachable (due to e.g. crashed instance or network partition): the symptom is that after five minutes all attempts to contact the primary seem to hang indefinitely (or until Unicorn kills our worker process). Note that our mongoid.yml
has options: {read: primary}
Attempting to replicate the problem in a test environment gets us a call stack something like this (some frames omitted for brevity)
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:47:in `read'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:47:in `block in read'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:119:in `handle_socket_errors'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:47:in `read'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection.rb:221:in `with_connection'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/connection.rb:114:in `read'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/protocol/query.rb:213:in `receive_replies'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/node.rb:180:in `block in ensure_connected'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/node.rb:432:in `refresh'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/cluster.rb:182:in `block in refresh'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/cluster.rb:151:in `nodes'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/cluster.rb:240:in `with_primary'
# /home./.bundler/ruby/2.1.0/moped-4c2a0a29122e/lib/moped/read_preference/primary.rb:55:in `block in with_node'
and after some digging the problem seems to be that implementing timeouts using socket options doesn't work in Ruby MRI. Looking at it with strace I see a call to read
that returns after five seconds with EAGAIN followed by a call to ppoll
- the EAGAIN doesn't ever get as far as interpreted code.
getsockname(7, {sa_family=AF_INET, sin_port=htons(44829), sin_addr=inet_addr("192.168.195.6")}, [16]) = 0
setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, "\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
futex(0x7f24117f6654, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f24117f66d0, FUTEX_WAKE_PRIVATE, 1) = 0
select(8, [7], NULL, [7], {0, 0}) = 0 (Timeout)
read(7, 0x7f2413130f00, 100) = -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=7, events=POLLIN}], 1, NULL, NULL, 8 <hang indefinitely>
See also http://stackoverflow.com/questions/9853516/set-socket-timeout-in-ruby-via-so-rcvtimeo-socket-option
The attached patch has a spec that demonstrates the problem (try running it under strace) and a fix which implements read timeouts "by hand" using select() instead of relying on socket options. A more full solution would probably address the write() case and be tested on SSL connections as well as plaintext ones, but I thought I'd push this early diagnosis up to you for comment first to see if I'm on the right track
/cc @dawid-sklodowski @cpoo22 just fyi
Coverage decreased (-0.06%) when pulling c56498f06d745c32fc0bcd8f32250afc75b57213 on simplybusiness:timeout_in_read into 7e5ac3eca7ce508b90d9ee866537367f87f254ee on mongoid:master.
cc @ngsmrk @dcoxall
We experience this issue as well. Would love to see it merged.
@telent, have you been running this in production? I tested this out on production and had some really strange errors (moped cursor returning empty documents) that were sporadic and hard to reproduce. I reverted the change for now but will do some more investigation later.
Though to be clear, I had done it by copying the code into a fork of Moped 1.5. I didn't try with Moped 2 yet.
I tried testing this again by issuing a scatter-gather query on a massive collection with hundreds of millions of documents. This wreaked havoc on the database, as it appears that the query continually kept getting retried but the cursor wasn't closed. As such, before I CTRL+C'd it, we had 19 identical queries running which caused massive queued reads and blocking. I'm not sure this is safe. Though, I had retrofitted it to 1.5, so it may work in 2.0 but I am bearish on this right now given what happened to us.
@durran have you thought about this problem at all in the new ruby driver? As @telent points out, SO_RCVTIMEO
simply does not work in MRI anymore. We experience this issue when we have a bad networking event or a physical host which has multiple mongoDs freezes or crashes; our processes completely hang and must be kill -9
'd. We have an identical stack trace to what @telent describes in the issue. Reviewing the code for the new driver, I think that we'll have the same issue as long as IO.select
is not used before the socket read. This pull as it is is problematic for us, my guess is because if a query takes longer than @timeout
to return, then the ConnectionFailure
causes it to be re-issued (as I mentioned above, we wreaked havoc on a database with this pull and a 10 second timeout), but this is a real operational issue that causes really bad problems for us.
@jonhyman I have a branch up on the new driver that actually does an IO.select
before selecting a server to use in an operation... (https://github.com/mongodb/mongo-ruby-driver/pull/670/files#diff-ea2082a78a5d9b36b80ce79b7705943dR86) The server.connectable?
call is eventually calling a select before returning it, and the operation would execute immediately after.
I may be wrong, but I don't think that that's going to help. Going back to Moped and this issue for a second, the Moped code already has a check_if_alive!
immediately before the read, but one issue is that alive?
is defined as this (in both Moped and it's roughly identical in the new driver)
def alive?
if Kernel::select([ self ], nil, [ self ], 0)
!eof? rescue false
else
true
end
rescue IOError
false
end
so if select
returns nil
, alive?
still returns true. The alive?
check has a 0 second timeout, so all you know is that the socket is not dead, but if it returns nil
then it does not have data on it right now and is not ready for reading. Because the current Moped code does this:
check_for_alive!
handle_socket_errors { super }
you don't know if the socket is ready for reading when you go to do a blocking read, and the super
call hangs indefinitely as @telent's stack trace shows (and we have that exact stack trace when it happens to us).
In your pull, connectable?
simply does an alive?
check, which, if select
returns nil
signifying that the socket is not ready for reading, connectable?
will still return true
. Then you'll end up calling @socket.read
in https://github.com/mongodb/mongo-ruby-driver/blob/master/lib/mongo/socket.rb#L172 and I think this code will be vulnerable to hanging again (I have not yet tested, your pull is the last one I'm waiting on before we perform failover testing scenarios with the new driver :grin:).
From all my testing locally, I can't figure out how to solve this hanging case in Ruby that does not involve using select
to set a "maximum read time" by making read_from_socket
look something like this:
def read_from_socket(length)
begin
@socket.read_nonblock(length) || String.new
rescue IO::WaitReadable, OpenSSL::SSL::SSLErrorWaitReadable
ready = IO.select([@socket], nil, [@socket], 300)
if ready
retry
else
raise StandardError.new("Maximum query time exceeded, no data from socket returned in 300 seconds.")
end
end
end
Note here I'm not raising a ConnectionFailure
to retry, I'm literally failing, because if it was in fact the case that the query should take longer than the max timeout, the ConnectionFailure
will cause the query to get re-issued, and you can get in the loop that we got into 4 months ago when I tested this and ran a dozen and a half huge, slow scatter-gather queries on a very large cluster and caused high locking/queued reads.