graphql-remote_loader icon indicating copy to clipboard operation
graphql-remote_loader copied to clipboard

Queries for Nested Related Data

Open SJTucker opened this issue 4 years ago • 3 comments

When trying to query for a collection that has a remote relationship, I'm noticing that we have an inefficient query happening in the remote service where the related objects live. In a graphql query requesting a paginated collection of Object 1s that has a foreign key for Object 2 (something like query { object1Collection {... object2 {...} } }), where Object 2 is located in a remote service, each instance of Object 2 is retrieved one by one rather than all together (like Object2.where(id: [array_of_ids]).

In looking through the graphql-remote_loader source code, it looks like this was an intentional design decision so that arguments aren't collapsed so you can request different properties for each instance. For our use of the gem, I don't think we'd ever have a need for that functionality, but rather we'd have multiple uses where we'd like arguments to be collapsed in such a way that instead of getting query { object2(id: 1) {...} } { object2(id: 2) {...} } etc, we would get query { object2(id: [1,2]) {...} }.

Hopefully I've explained this in a way that makes enough sense, and let me know if I need to expand further.

I'm wondering if I'm missing something about how to get the gem to work for that kind of scenario and if you have any advice on how to address it, or if this just not an intended use of the gem (though I'm thinking this would be a fairly common scenario). If not, obviously we could fork the gem and change how that part works for ourselves, but I wanted to check with you first to see if I was misunderstanding something.

Thanks

SJTucker avatar Sep 14 '20 04:09 SJTucker

Hey @SJTucker !

If I'm understanding correctly, you have a resolver that's doing something like this?

def foo(id:)
  MyLoader.load("users(id: [#{id}]) { ... }")
end

This isn't something we handle right now because it's not a safe assumption that we can always collapse these cases, the GraphQL spec doesn't promise us that array arguments will work like this in all cases. In cases where the argument is a filter, or some other non id argument, we could get weird things happening.

I still think it's needed because of the N+1 you're hitting, but it can't be automatically merged. If there was a way to manually say "I want this argument to be "mergeable", did you have an API in mind for how this might work?

I need to do a little more thinking on this, there's also the challenge of plucking out the correct record from the list when yielding back to your resolver. If users(db_id: [1 ,2]) returns two records but you didn't fetch ID (or the type doesn't have a db_id type on it), it's challenging to know which of the two records is for which caller.

d12 avatar Sep 14 '20 14:09 d12

That's correct. I haven't considered an api for this. Something like MyLoader.load("users(id: [#{id}]) { ... }", mergeable_arguments: true) or prefixing mergeable arguments with mergeable (users(mergeable_id: [#{id}])} are the first ideas that come to my mind, but I'm not sure right now if either of those are the best.

Also, that's a great point about having to map the results back to the resolver that corresponds to just an id. That sounds tricky. I don't know how you'd track that without making it an array of hashes (though I'm thinking this wouldn't work out very well. Wondering if there's a clever way to append something to the aliases to track that.

{
  p2_object: object {
    p2_id: id
    p2_user_merge_id_1: user { ... }
  }
  p3_object: object {
    p3_id: id
    p3_user_merge_id_2: user { ... }
  }
}

Forgive me if I'm butchering how the aliasing actually works - haven't spent a lot of time in the source. And I'm not sure if that or any of these ideas are useful lol, but they're the first things that came to my head.

SJTucker avatar Sep 14 '20 17:09 SJTucker

After thinking through this some more, I'd suggest seeing if you can implement this batching on the underlying GraphQL API, not within this gem. For a query like:

{
  one: users(id: [1]) { edges { node { name } } }
  two: users(id: [2]) { edges { node { name } } }
}

The underlying GraphQL API should be able to batch these two users calls into one database query. Using ruby, the graphql-batch gem will do this. For other API backends there should be similar promise-based solutions.

For this gem to be able to batch this before sending the query off, we need to be able to split the result properly but we cannot in this case. Say we pass the above example query through the merger and get:

{
  users(id: [1, 2]) { edges { node { name } } }
}

The response will look like:

{
   "data":{
      "users":{
         "edges":[
            {
               "node":{
                  "name":"Username A"
               }
            },
            {
               "node":{
                  "name":"Username B"
               }
            }
         ]
      }
   }
}

But we have no way of telling which caller needs the "Username A" object and which caller needs the "Username B" object. A core part of this gem is splitting the response properly and not sending extra JSON keys to callers, it's required for proper GraphQL error handling.

The way the gem normally solves this is with clever aliasing, but we can't apply different aliases to each result of a field that returns an array, that's not possible in the GraphQL spec.

Let me know if I'm missing something obvious, but the route forward here will be database query batching in the underlying API backend :)

d12 avatar Sep 24 '20 23:09 d12