Don't nest Async primitives inside plain Ruby fibers
Fixes #5463
Sad, it seems like the test suite hangs now...
Ok, I think the problem was that Fiber / Task nesting. I reworked it so that each pass over the set of pending dataloader fibers spins up a new job. This avoids any mixing between Async primitives and plain Ruby Fibers. It messed up some of the fiber counting tests though -- I'll sort them out in the morning.
@iyotov-havelock, could you try this branch in your app's test suite and development? You can bundle it with:
gem "graphql", github: "rmosolgo/graphql-ruby", ref: "async-dataloader-deadlock-fix"
please let me know how it goes!
Hey @rmosolgo, thanks for checking this issue out.
Tested it with my 'real' app in development, sadly there's still an issue there:
Tested it with the demo app (https://github.com/iyotov-havelock/async-dataloader-issue), the No live threads left. Deadlock? error is no longer showing, but the GraphQL response is {"errors":[{"message":"undefined method 'nfields' for nil"}]}, which does not show up when using GraphQL::Dataloader instead of GraphQL::Dataloader::AsyncDataloader
I saw that error, too, and assumed that GraphQL was "working," since it called code as expected. Inspecting the dataset, I see that it was passed the right values:
#<Sequel::Dataset::_Subclass: "SELECT * FROM \"foo_models\" WHERE (\"id\" IN (1, 2))">
My hunch is that Async + Sequel aren't playing nice here ... but do you have a reason to think that AsyncDataloader in particular is the cause of the issue?
Could you please share the full stack trace (NOT a screenshot) from the error in your application? Maybe some lines there will be a clue.
Also, if you can modify the replication app so that it deadlocks on this branch, I'd be happy to keep debugging on that.
UPDATE:
I got the example to run successfully by adding the fiber_concurrency extension:
diff --git a/config/database.rb b/config/database.rb
index 16f3b03..7ec45a8 100644
--- a/config/database.rb
+++ b/config/database.rb
@@ -1,4 +1,5 @@
require 'sequel'
+Sequel.extension :fiber_concurrency
DB = Sequel.connect(
adapter: 'postgres',
Have you tried that extension in your app? I found it by searching randomly around the Sequel repo for the word "Fiber"...
With that addition, I get:
curl -X POST http://localhost:9292/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ getAllFoos { id name barModels { id name } } getAllBars { id name fooModel { id name } } }"}'
{"data":{"getAllFoos":[{"id":"1","name":"First Foo","barModels":[{"id":"1","name":"Bar 1 for Foo 1"},{"id":"2","name":"Bar 2 for Foo 1"},{"id":"3","name":"Bar 3 for Foo 1"}]},{"id":"2","name":"Second Foo","barModels":[{"id":"4","name":"Bar 1 for Foo 2"},{"id":"5","name":"Bar 2 for Foo 2"}]}],"getAllBars":[{"id":"1","name":"Bar 1 for Foo 1","fooModel":{"id":"1","name":"First Foo"}},{"id":"2","name":"Bar 2 for Foo 1","fooModel":{"id":"1","name":"First Foo"}},{"id":"3","name":"Bar 3 for Foo 1","fooModel":{"id":"1","name":"First Foo"}},{"id":"4","name":"Bar 1 for Foo 2","fooModel":{"id":"2","name":"Second Foo"}},{"id":"5","name":"Bar 2 for Foo 2","fooModel":{"id":"2","name":"Second Foo"}}]}}
@rmosolgo I modified the replication app, check out this commit: https://github.com/iyotov-havelock/async-dataloader-issue/commit/6af81bf63c3ec5b2eafee140f0c07ab4a5c59333
To reproduce the issue, do the getAllFoos query again. It's quite possible to be an Async + Sequel issue
Interesting -- a change like that puts Async primitives back inside a plain Ruby Fiber.new { ... }, further suggesting that's the origin of the issues here. It might be possible to rewrite that part of AsyncDataloader to use Async primitives entirely, rather than plain Ruby fibers, but it was really hard to do last time so I gave up. I'll try again...