graphql-ruby Use a queue for execution

I'm feeling really inspired by @gmac's proof-of-concept over at https://github.com/gmac/graphql-cardinal/, so I thought I'd try again at refactoring execution to use a queue instead of recursive method calls. I have tried before (#4967, #4968, somewhat #3998, #4935) and given up along the way, but I'm going to try again 😅

TODO:

[x] Extract some refactors from this branch
[x] Make Lazy and Dataloader work
[x] Consider steps for continue_value, resolve_list_item
[x] Use steps for call_method_on_directives
[x] Merge ResolveTypeStep into ResultHash
[x] Merge directive checks into steps too?
[x] Merge authorized_new lazy usage into ResultHash
Look for other after_lazy usages and rebuild them using queue steps
[x] Refactor ResultArray and ResultHash into state machines
[ ] Merge ResultHash into Schema::Object
[x] Fix dataloaded arguments
[x] RunQueue is shared by the multiplex -- can it be merged into one of the other objects with the same lifecycle? (Multiplex, Interpreter, Dataloader)
[x] Tighter integration between Dataloader, Lazy, and Queue?
[ ] Merge Field extensions into FieldResolveStep to reduce overhead when used?
Tests that are still failing:
- [ ] current_field, current_arguments in context
- [ ] rescuing authorized errors
- [ ] batching lazies correctly
- [ ] supporting parallel I/O in dataloader
- ...

Jun 20 '25 20:06 rmosolgo

There's still a lot to do here for compatibility and performance, but the initial benchmark results are surprisingly good (even with the terrible FieldResolutionStep implementation which is hogging memory). This is using a modified version of the benchmark from graphql-cardinal:

  ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-darwin22]
  Calculating -------------------------------------
  graphql-ruby: 142 resolvers
-                         399.549 (± 8.5%) i/s    (2.50 ms/i) -      2.000k in   5.048521s
+                         538.453 (± 3.9%) i/s    (1.86 ms/i) -      2.695k in   5.013324s

  graphql-ruby: 140002 resolvers
-                           0.517 (± 0.0%) i/s     (1.93 s/i) -      3.000 in   5.797333s
+                           0.761 (± 0.0%) i/s     (1.31 s/i) -      4.000 in   5.272811s

- Total allocated: 10149304 bytes (70353 objects)
+ Total allocated: 13431224 bytes (97389 objects)

So that's a 34% or 47% speedup. It's using 30% more memory, because it's currently the Simplest Thing That Could Possibly Work. Presumably refactoring it to avoid those repeated allocations will make it even faster!

Jun 23 '25 21:06 rmosolgo

I haven't pushed a commit here for a while but I'm still planning on landing this somehow. Here's a quick brain-dump in case anyone is curious:

The perf win is not as big as it was initially because I added objects (FieldResolveStep) for each field resolution. Amazingly it's still faster than the previous implementation. But it's 20%-30% IIRC.
It might get slower because the shift to this kind of flow will also require queue objects for Argument resolution (basically anywhere that can return a Promise)
The time tradeoff comes at the expense of memory which I think is worth it. I have some ideas for reducing memory once this is really working:
- Merge GraphQLResultHash into GraphQL::Schema::Object -- they have the same lifecycle at runtime so they could be merged.
- Merge Field::ExtendedState into FieldResolveStep; this will save an allocation in possibly-tight loops when field extensions are added to repeatedly-used fields.
- Introduce "optimized" field execution. Sometimes we can detect when a field could use a much simpler execution flow, eg "just call a method and insert the result into the hash", without preparing arguments, running extensions, etc. This could cover a lot of basic fields and speed things up, but we still have to be ready for Promises and Dataloader pauses.
Merging and releasing this will be tricky. I expect to have full compatibility when I'm done (or very-nearly-full 😅 ) but it's such a huge change that I want to make sure it's a gentle release so nobody gets nasty surprises. Some thoughts:
- Many of the changes in this branch could be isolated and released separately: merging Dataloader and Lazy resolution; merging ResultHash and Schema::Object; moving execution code into named methods on Step classes;
- After that, you could switch from stack-based to queue-based using code that was mostly in place. Ideally both implementations would be available (and runtime-swappable) but I think the implementation will have to reveal itself once the other changes have been merged.

Jul 26 '25 10:07 rmosolgo