rack-robustness
                                
                                
                                
                                    rack-robustness copied to clipboard
                            
                            
                            
                        handle_happy creating new request - why?
Was experiencing some database timeouts in a Rails app - the symptom is ActiveRecord complaining it gave up waiting for a connection to be available in the connection pool.
When we replace the call to handle_happy with return @app.call(env) we aren't able to reproduce the issue.
I notice this was originally the behaviour of this gem - it got refactored out in this commit: https://github.com/blambeau/rack-robustness/commit/049d83a5d48c5c4990c1790dacd36e9b99f23a79
It looks like there may be a very good reason for this which I can't understand - could you possibly explain why we're generating a new request object and 'finishing' it rather than the old behaviour? I'm not able to tell from the commit messages.
I suppose that you mean "handle_happy creating new Response"?
Well, as of this commit (https://github.com/blambeau/rack-robustness/commit/7ecbe5be3240d19dbd0e325ee8f8d5a0c99bdcf0), all callbacks may use the request and response instances. The ensure callbacks in particular are allowed to do so, and hence, a new Response object is also needed in the happy case.
Could you try to create a test that reproduces/demonstrates your issue? I'm not sure to understand what you expect exactly.
Didn't expect such a fast reply - thanks!
I'm really not sure how this causes the issue, and it could be in so many places in the Rails stack I don't think I could quickly isolate this to get a failing test case.
For now we're not needing the callbacks for successful requests so we're monkey-patching the old behaviour in.
If I can find what it is, I'll definitely report back. Cheers!
fwiw, I'm seeing this in my own application: using 049d83a means that my Rails app throws ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds). If I switch to using 273b87ef593cac5152160b9c1d0dff53b9375df7 in my Gemfile, everything works just fine.
@indirect I'm not a Rails user, so I must confess that it would be very helpful to have a minimal base to reproduce the bug.
The only reason I see so far would be if Rails monkey patches Rack::Response (do you know if it does?). I'll need to dig further, though. Any help appreciated!
I'll see if I can find more tomorrow, but I think it's related to the way that Rails manages database connections, using a middleware. Here's that middleware: https://github.com/rails/rails/blob/850159bd2c5e1e108d0256dd05424bbbf7926b59/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L642-L645
So it's not monkey-patching it, but if that exact instance gets lost, then the code that closes the database connection gets lost too. Maybe you need to also use RackProxy?
@indirect I don't like knowing that the gem does not play nicely with Rails. Could you send me a stack trace on timeout?
I'm not sure to understand how ConnectionManagement could be the cause as it seems to handle the closing of connections. According to @arrtchiu description, the error seem to be on obtaining connections.
@blambeau the ConnectionManagement middleware manages a pool of connections to the database. During a request, any ActiveRecord requests to the database will check out a connection from the pool and talk to the database. Once the request is complete, the ConnectionManagement middleware checks that database connection back in to the pool, so that the next request can use that connection.
I believe that the current version of rack-robustness prevents the ConnectionManagement middleware from checking connections back in to the pool when the request is complete, because it is possible to make a small handful of requests to a newly started Rails server. The problem is that when no connections are returned to the pool, later requests cannot check out a connection to the database, and so they fail.
The backtrace on a failed request (which, again, is after several requests have succeeded) is in this gist.
@indirect Ok, I see. Thanks for the explanation. I should be able to figure out how to fix that, but I'll probably have to reproduce it first anyway.