makara icon indicating copy to clipboard operation
makara copied to clipboard

ActiveRecord connection timeouts on non-makara, empty connection pool

Open aks opened this issue 8 years ago • 4 comments

We are evaluating makara as a replacement for our use of octopus within our Rails 4 app using Postgres 9.5.

One unresolved problem we are having in our evaluation is that there are some inexplicable ActiveRecord connection timeouts, but the stacktraces do not appear to involve makara at all.

From my review of the makara code, it appears that it "hijacks" and proxies all of the AR DB connections.

Is it expected that some AR connections will not go through makara?

I've attached a stack trace of one of the connection timeout errors. 1.txt

We have other errors that correctly involve makara, but which are not makara-induced. See the second stack trace. m1.txt

For the most part, makara is working correctly, except that we are having these strange DB connection timeouts, for which it appears the connection is not using makara.

Thanks for any insights.

aks avatar Feb 15 '17 00:02 aks

I've reviewed the makara code, and the AR code that was in the stacktrace. Here are my findings:

In makara_abstract_adapter.rb, line 108, makara hijacks the core AR methods:

hijack_method :execute, :select_rows, :exec_query, :transaction

It also wraps some other AR methods to cause their effects to be distributed to all the connections in both the master and slave pools:

send_to_all :connect, :reconnect!, :verify!, :clear_cache!, :reset!

However, I notice that in our copy of Rails AR (4.2.6), the find method has this method in core.rb, starting at line 148:

s = find_by_statement_cache[key] || find_by_statement_cache.synchronize {
  find_by_statement_cache[key] ||= StatementCache.create(connection) { |params|
    where(key => params.bind).limit(1)
  }
}
record = s.execute([id], self, connection).first

The stacktrace shows that the connection call, appearing in the argument of that last line, is the one hanging, through many other nested AR methods, waiting on an available connection. However, none of that nested code in the stacktrace involves makara. It's pure AR code, multi-thread safe, waiting for an available connection, with multiple nested mutex semaphore locks.

When I examine the connection handler code in makara, it becomes clear that the connection pools that makara is managing are distinct from the connection pools that AR is waiting on. Because makara hijacked most of AR's connection management, it's unclear that AR even has a connection pool to work with.

Is there some reason that the connection method isn't hijacked to make the connection retrieval also flow through makara?

aks avatar Feb 16 '17 19:02 aks

Thanks for your investigation. To my knowledge (which is somewhat limited), there is no reason. Seems like something to try. It's possible @mnelson remembers something. It's been awhile.

bleonard avatar Feb 16 '17 20:02 bleonard

@aks Did you ever figure this out? I'm also facing the same issue and would like to know what you ended up doing. Thanks!

swordfish444 avatar Sep 05 '17 00:09 swordfish444

@swordfish444 -- we switched to evaluating two other gems: octopus and fresh_connection. Both are less complex gems, work with multi-threaded environments, and also manage the cache coherency problem.

aks avatar Sep 11 '17 15:09 aks