rspec-support
rspec-support copied to clipboard
ReentrantMutex hangs with fibers, even after 3.11.0
Subject of the issue
Hi! I'm following up on a reported Rspec hangup with GraphQL-Ruby's "Dataloader," a fiber-based data batching library (Original bug report: https://github.com/rmosolgo/graphql-ruby/issues/4172). The original report also included a replication: https://github.com/khamusa/gql-dataloader-rspec-mocks-hanging/pull/1/commits/fd6c05b4394fe9be72960a5b7548f94caf8d1d71#diff-cd6746fe941432087bd4ad1ba284ce7e6d3f94ef62e3954e8c04fcc007ba3968R3
I see that there has been recent work on this in #503, etc, but even on 3.11.0, I see this stuck process.
I cloned the replication repo and triggered the reported bug, then inspected the stuck ruby process with rbspy snapshot, and here's what I saw:
Stack trace of stuck ruby process
block in spawn_fiber - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/dataloader.rb:304
block in run - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/dataloader.rb:183
block (3 levels) in coerce_arguments - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/member/has_arguments.rb:249
block in evaluate_selection - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:429
evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:530
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:929
block in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:529
call_method_on_directives - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:830
block (2 levels) in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:518
trace - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/tracing.rb:67
block (3 levels) in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:502
resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:747
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block in resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:727
with_extensions - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:836
block (2 levels) in resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:723
public_send [c function] - (unknown):0
resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/relay_classic_mutation.rb:70
resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:116
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:115
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block (2 levels) in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:113
after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033
block (3 levels) in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:112
public_send [c function] - (unknown):0
resolve - /Users/rmosolgo/code/gql-dataloader-rspec-mocks-hanging/spec/debugging_issue_spec.rb:24
block (2 levels) in define_proxy_method - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/method_double.rb:65
proxy_method_invoked - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/verifying_proxy.rb:162
proxy_method_invoked - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/method_double.rb:81
message_received - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/proxy.rb:367
message_received - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/proxy.rb:232
invoke - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:476
invoke_incrementing_actual_calls_by - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:630
call - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:765
map [c function] - (unknown):0
block in call - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:764
block (5 levels) in <top (required)> - /Users/rmosolgo/code/gql-dataloader-rspec-mocks-hanging/spec/debugging_issue_spec.rb:78
block in let - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:343
fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:182
fetch [c function] - (unknown):0
block in fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:181
synchronize - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:26
enter - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:36
lock [c function] - (unknown):0
It seems to me like GraphQL-Ruby is executing a query, which happens inside a newly-created fiber, and then that query calls a mocked method (specifically, a method with a block-style mock), and during that block, it calls a let(...)-ed method, and during that call, the process hangs. (The let(...)ed method never returns.)
I expect this test to continue running, but is there another way it should be written? (Or, maybe something is out-of-sorts with GraphQL-Ruby's use of Fibers?)
Your environment
- Ruby version: 3.0.3
- rspec-support version: 3.11.0
Steps to reproduce
git clone [email protected]:khamusa/gql-dataloader-rspec-mocks-hanging.git
cd gql-dataloader-rspec-mocks-hanging
git checkout bug-demo
bundle
bundle exec rspec
Expected behavior
The let(...) value should return and the test should continue running
Actual behavior
The test hangs forever, with the rbspy snapshot pointing at:
fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:182
fetch [c function] - (unknown):0
block in fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:181
synchronize - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:26
enter - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:36
lock [c function] - (unknown):0
:wave: Seeing as you likely have knowledge of whats going on with fibers in the GraphQL package, can you setup a reproduction without the GraphQL library but just an equivalent usage of Fibers? Is Thread.current disrupted within your usage?
Hey @JonRowe, sure thing, I just worked up a minimal reproduction here: https://github.com/rmosolgo/rspec-support/commit/8a32cffcf155f775821edee6de5529a9c7193907
Interestingly, I found that subject(...) was also required to replicate the problem -- if I moved that call into the test body, it didn't hang!
I don't think there's any messing with Thread.current in the code there.
I don't this is a problem with ReentrantMutex; it is likely a duplicate of this issue with rspec-core: https://github.com/rspec/rspec/issues/14 due how locking semantics changed with Ruby 3.
I actually verified that ReentrantMutex is not the cause in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/102625#note_1156203743 by patching rspec-support to use Ruby's Monitor type, which is also reentrant.
Speaking of which, why does rspect-support not use Monitor to provide reentrancy?
Speaking of which, why does rspect-support not use Monitor to provide reentrancy?
Off the top of my head, Monitor was introduced after Ruby 1.8.7 that we still support in RSPec 3.
It’s an option to use Monitor, and fall back to ReentrantMutex, or just switch to Monitor in RSpec 4. Mind sending a PR to the 4-0-dev branch with your patch?
Ah I see - makes sense :+1:
I wonder if this work is necessary at this point since it is being discussed to remove locks around memoized blocks completely in https://github.com/rspec/rspec/issues/14
I would say let's wait for a decision about what should happen to threadsafe since that is the crux of the matter, not the lock implementation.
If you research the original development of this you'll find Monitor was actually the source of the original ReentrantMutex definition, with the desired chunks being essentially vendored to remove the dependency on it (we consider anything that requires a require a dependency to be minimized to avoid environment poisoning).
Closing this as a duplicate of rspec/rspec#14 during cleanup for the monorepo.