gqlgen Proposal: deterministic batch loading

Context

Dataloaders

As described in the documention, calls to databases or other external services need to be batched to achieve decent performance. This is generally achieved by using the dataloader pattern.

In golang, the dataloader implementations (dataloaden, graph-gophers/dataloader) work roughly this way:

dataloader.Load(id) is called
dataloader.Load blocks for a set amount of time (for example 5ms)
once the amount of time has elapsed, the user-provided Fetch function is called with the ids passed to the calls of dataloader.Load made during the blocking time, returns the values and the dataloader.Load calls return

This approach has several drawbacks:

each call to dataloader.Load that starts a new batch blocks N ms, which means nested queries take at least depth * N ms
dataloader.Load calls need to be thread-safe, which results in significant overhead
Most importantly, under load (including large graphql queries), unless the wait time is excessively large, the dataloader.Load calls will be made too slowly, resulting in poor batching, reduced performance and increased database load.

gqlgen

gqlgen builds the query results depth-first, top to bottom, and spawns a goroutine for each field in a fieldset, and each element in an array (with a few optimizations). For example, the query query { users { stars; likes }} will roughly result in this Users(); go marshallUser(user1); go Stars(user1); go Likes(user1); go marshallUser(user2)...

This compounds the dataloaders drawbacks since:

the goroutines aren't spawned in the right order to optimize batching (ideally all the calls to the Stars resolver would be made first, then all the calls to the Likes resolver)
the approach can result in spawning a large number of goroutines (for example 3000 goroutines for 1000 users), further adding overhead

Previous discussion

https://github.com/99designs/gqlgen/issues/518

Proposal

Add built-in batch-loading to gqlgen. Instead of having the following resolver signature:

func (r *itemResolver) Children(ctx context.Context, obj *model.Item, ...) ([]*model.Item, error)

have this one:

func (r *itemResolver) Children(ctx context.Context, obj []*model.Item, ...) ([][]*model.Item, []error)

Proof of concept (simple schemas work. nulls, error handling, introspection, middlewares and likely many other things don't)

From an implementation standpoint, this would be made by walking the query tree depth-first, calling a resolver once for each node of the query. Taking the example above, this would result in roughly the following execution plan: Users(); go Stars(users); go Likes(users).

Advantages:

Queries are batched optimally and deterministically
Few goroutines are spawned and there is no need for synchronization in the resolvers
The right way to do things (batching) is the default way to do it
This can be easily be made backwards-compatible by generating code to call the "old-style" resolvers in a loop

Drawbacks:

If the same data is requested in different parts of the query, it can be fetched only once with dataloaders
implementing this in gqlgen would be pretty involved

Benchmark

Test query:

query q {
  root {
    children(num: 100, latency: 0) {
      children(num: 100, latency: 0) {
        children(num: 100, latency: 0) {
          id
        }
      }
    }
  }
}

the children resolver returns an array of num items, resulting in 100x100x100 leaf items.

Tested implementations:

gqlgen + dataloaden
proof of concept

Execution plan

Each line in the logs below is a call to the dataloader fetch function (gqlgen + dataloaden) or the resolver (proof of concept), p0 is the id of the first item in the argument array, the number at the end is the length of the argument array.

gqlgen+dataloaden:

Items (p0: root): 1
Items (p0: root_99): 100
Items (p0: root_99_99): 352
Items (p0: root_91_99): 489
Items (p0: root_81_99): 762
Items (p0: root_67_99): 1000
Items (p0: root_49_99): 1000
Items (p0: root_31_99): 1000
Items (p0: root_13_99): 1000
Items (p0: root_97_29): 289
Items (p0: root_91_39): 511
Items (p0: root_79_65): 638
Items (p0: root_67_50): 509
Items (p0: root_53_26): 142
Items (p0: root_48_82): 774
Items (p0: root_31_34): 800
Items (p0: root_13_89): 533
Items (p0: root_92_73): 26
Items (p0: root_54_60): 63
Items (p0: root_50_69): 86
Items (p0: root_32_80): 26

Batching works well at the beginning (1000 is the max batch size), but degrades as the processing progresses.

proof of concept:

Items (p0: root): 1
Items (p0: root_0): 100
Items (p0: root_0_0): 10000

The optimal number of calls is made.

Rough performance numbers

With GOMAXPROCS=1, the proof of concept does 570ms/req and gqlgen+dataloaden 2.2s/req With GOMAXPROCS=4, the proof of concept does 420ms/req and gqlgen+dataloaden does 950ms/req

Note: a real-life implementation of the proof of concept could:

either be a little slower, since several things such as nulls handling and error handling aren't implemented.
either be a little faster, because the proof of concept does some unnecessary slice resizing and allocations

Discussion

Is this something that would fit with gqlgen?

Any alternative approaches or suggestions?

Appreciate the feedback.

Sep 14 '22 20:09 n-e

Would love to see this. It would be majorly breaking, but probably worth it!

Dec 08 '24 16:12 carldunham

Agreed, maybe gqlgen could allow you to opt into it per resolver?

Dec 09 '24 08:12 Brookke

I'm wondering if there is a coroutine solution here with the new iter.Seq and iter.Pull? Haven't thought too deeply on it yet, but I have been interested in more "coroutine"-like batching for gqlgen.

Dec 12 '24 22:12 jarreds