graphiti
graphiti copied to clipboard
Add thread pool with promises to limit concurrent sideloading
Previously, sideloading in Graphiti used an unbounded number of threads, which could decrease performance due to resource contention and exhaust the ActiveRecord connection pool, leading to timeouts and connection errors.
This commit introduces a thread pool based on ActiveRecord's
global_thread_pool_async_query_executor
, limiting the maximum number of
resources that can be sideloaded concurrently. With a properly configured
connection pool, this ensures that the ActiveRecord connection pool is not
exhausted by the sideloading process.
Promises are used to manage recursive links between threads, avoiding deadlocks that were reported in earlier attempts.
Single-threaded mode is reintroduced with a new implementation that uses promises in both async and sync modes. The difference lies in whether the thread pool is set to synchronous or not.
Parent thread's local variables are copied to the sideloading threads, ensuring that any thread locals available to the Graphiti resolving thread are also accessible to the sideload threads. Similarly, parent fiber's local variables are copied to the sideloading threads, preserving fiber-local context during sideloading.
Thread pool statistics broadcasting is added for observability. Collecting stats on the thread pool can be useful to understand the state of the thread pool, for example, if many tasks are being queued.
Bump concurrent-ruby
gem from version 1.0 to 1.2+, which includes a fix
for worker threads getting stuck when using the caller_runs
policy
(ruby-concurrency/concurrent-ruby#933).
TODO
- [x] Fix deadlock
- [x] Fix rails integration
- [x] Refactor
ResourceProxy#data
to another method for concurrent - [x] Does save break, i.e. does
resolve_sideloads
return a promise? - [x] Close adapter
- [x] Thread locals
- [x] Thread pool stats broadcast (to be tested)
Closes https://github.com/graphiti-api/graphiti/issues/469
See: https://github.com/rails/rails/blob/94faed4dd4c915829afc2698e055a0265d9b5773/activerecord/lib/active_record.rb#L279-L287 See: https://github.com/graphiti-api/graphiti/pull/470
Still getting a lockup. This sounded possibly related: https://github.com/ruby-concurrency/concurrent-ruby/issues/933
And it looks like graphiti only requires concurrency-ruby ~> 1.0
, when really it should be > 1.2
to include that fix? However, I tried including that and trying it out and it didn't seem to help.
But one thing i realized: one request that's causing a hang has 13 sideloads… which is a lot, admittedly. But still, if there are more sideloads than there are threads, I think that might be a trouble spot?
Still getting a lockup. This sounded possibly related: ruby-concurrency/concurrent-ruby#933
And it looks like graphiti only requires
concurrency-ruby ~> 1.0
, when really it should be> 1.2
to include that fix? However, I tried including that and trying it out and it didn't seem to help.But one thing i realized: one request that's causing a hang has 13 sideloads… which is a lot, admittedly. But still, if there are more sideloads than there are threads, I think that might be a trouble spot?
Thanks for testing that, @jkeen. Good find on that bug. Upgrading the gem is definitely something to consider but as you say, it doesn't seem to have solved the issue.
I'm surprised we're having issues as the code is very similar to that used by ActiveRecord for the new async queries. So perhaps there's something in the way that Graphiti is calling the thread pool that is causing problems. I suspect that's more the case then there being a bug in concurrent-ruby's thread pool.
13 sideloads seems a lot but the default thread pool has 4 threads and a queue of 4 * 4 that (25) and if the queue is expired it's meant to use the calling thread to run the request. I'm wondering if that the :caller_runs option might be where the problem lies.
Are all of the sideloads running ActiveRecord queries?
I think I might need to add some logging to debug the issue.
All the sideloads run active record queries, and some are pulling nested relationships off the sideloads, as in includes=items,items.comments
, etc.
I think figuring out a good logging solution would be great improvement—I dug around the ruby-concurrency docs briefly and couldn't make heads or tails of how one might log when a new thread is getting created or hung up. But if we had some good logs this issue might be much easier to diagnose.
As far as the other options for fallback, I haven't tried :abort or :drop. I would assume abort would make the request throw a 500, and drop would… return incomplete data?
I've added a failing test case that shows the deadlock:
https://github.com/graphiti-api/graphiti/pull/472/commits/e4de93e5108ddefeff8d3f9639a1ca950dccbef8
Excellent!! Nice work on that 🥇
@jkeen do you know how to get the rails tests to run locally? I'm struggling to debug the failures there.
I've tried running:
$ bundle exec appraisal install
Which fails when trying to install the sqlite3 gem
An error occurred while installing sqlite3 (1.4.4), and Bundler cannot continue.
The installed sqlite3 is the default that comes with macOS
$ sqlite3 --version
3.32.2 2021-07-12 15:00:17 bcd014c473794b09f61fbc0f4d9488365b023f16123b278dbbd49948c27calt2
I've also tried using a later version.
@MattFenelon Sqlite is notorious for throwing build errors and I feel like I've spent countless hours of my life troubleshooting system-specific compilation errors with sqlite with ruby and node projects, and with the time in that space I should be an expert, and yet every time it comes along I'm no better equipped than I was the last time.
Did you try installing the gem separately first? This has also helped me in the past: https://github.com/sparklemotion/sqlite3-ruby/blob/main/INSTALLATION.md
@MattFenelon Sqlite is notorious for throwing build errors and I feel like I've spent countless hours of my life troubleshooting system-specific compilation errors with sqlite with ruby and node projects, and with the time in that space I should be an expert, and yet every time it comes along I'm no better equipped than I was the last time.
Did you try installing the gem separately first? This has also helped me in the past: https://github.com/sparklemotion/sqlite3-ruby/blob/main/INSTALLATION.md
Sounds about right 😅
I've managed to work around it by upgrading to version 2 of the sqlite3 gem as that uses native binaries. I also had to upgrade rails to the main branch version (8) but that's got me to a point where I can debug the rails tests.
@MattFenelon Are you pretty satisfied where this work is at now? I haven't tried it again in my staging or production app, just wanted to get your read on it before I give it another spin.
@MattFenelon Are you pretty satisfied where this work is at now? I haven't tried it again in my staging or production app, just wanted to get your read on it before I give it another spin.
I was but there seems to be some flakiness for some of the tests that could be related to the changes but I'm not sure. Do you know if those tests are already flaky?
I have tested the changes locally with a test server with various configurations of rails thread pool, graphiti thread pool and the db connection pool. I've not had a chance to test in production. If you think the test is a red herring, I'd be really interested to hear your experiences of testing it.
I would like to finish this. I plan to come back to it but it probably won't be for a few weeks now.