graphql-go icon indicating copy to clipboard operation
graphql-go copied to clipboard

Expose the tree of field subselections to the resolver on demand

Open xmirya opened this issue 7 years ago • 44 comments

What's requested in https://github.com/neelance/graphql-go/issues/17 .

Use case: we have Order and Owner entities stored in the DB, each Order refers to one Owner. We want to expose the API to list orders, optionally providing owners info, e.g.

type QueryRoot {
  orders(some criteria): [Order!]
}

type Order {
  field1
  field2
  ...
  owner: Owner!
}

type Owner {
  field1
  field2
  ...
}

W/o knowing whether the user has requested Owner info in the orders resolver we end up with doing 1 query for the orders list + N queries for each Owner (assuming each Order has a different Owner). If orders resolver might check ahead of time whether Owner is requested, it might end up running just one (in case of relational DB, using INNER JOIN) or two (in the common case, one to fetch the orders, second to fetch owners by the list of IDs found from the 1st query) queries.

This is well aligned with how many ORMs allow related entity preloading, like orm.GetOrders(...).With("Owner"), so order.GetOwner() later does not result in a separate query.

xmirya avatar Feb 25 '18 23:02 xmirya

If I'm interpreting semaphoreci setup errors correctly, someone needs to fix semaphoreci-github integration for this to build to pass?

darh avatar Mar 10 '18 13:03 darh

SemaphoreCI hasn't really handled the github.com/neelance/graphql-go -> github.com/graph-gophers/graphql-go transition very gracefully :(

I'll look into what I can do to straighten it out.

tonyghita avatar Mar 10 '18 15:03 tonyghita

It looks like SemaphoreCI issue is fixed. Anything else that is stopping us from merging this?

darh avatar Mar 20 '18 06:03 darh

@tonyghita @xmirya 👋

darh avatar Mar 26 '18 11:03 darh

I see no reasons for this not to be merged (naturally, as I filed this PR :) ); I use this patch for some time in one of private projects

xmirya avatar Mar 26 '18 11:03 xmirya

@xmirya I applied this patch to our fork along with a basic test: https://github.com/qdentity/graphql-go/blob/5472fce1344b4de8067df6cf53d09384c6533ff3/graphql_test.go

dvic avatar Mar 27 '18 11:03 dvic

@xmirya Added support for wrapper types, i.e., can be any type X []pubquery.SelectedField

https://github.com/qdentity/graphql-go/commit/d95424a79e896f47cc5a3ea13e443f83022432a7

dvic avatar Mar 28 '18 10:03 dvic

@xmirya

Hey,

Thanks for the PR. I also came up with a solution but I like your solution better.

I have a couple of suggestions:

  1. Wouldn't it be better to put selection set in the context to avoid an extra argument in resolver method? Context could also be passed down with selection set intact and downstream can benefit from it if and when needed. I think this approach would make the api more generic.

  2. I think it would be nice if selection set also included variables for a field if there are any. This would be very useful for querying graph DBs (i.e., DGraph) and I am sure queries to SQL DBs could also benefit from it. For example, the following schema

schema {
	query: Query
}

type Query {
	user(id: ID!): User!
}

type User {
	id: ID!
	name: String!
	email: String!
	phone: String
	address: [String]
	friends(page: Pagination): [User!]
}

input Pagination {
  first: Int
  after: ID
  last: Int
}

with GraphQL Query

query GetUser($id: ID!, $page: Pagination) {
	user(id: $id) {
		name
		email
		phone
		address
		friend(page: $page) {
			id
			name 
			email
		}
	}
}

would benefit from a selection set having variable values for each field. When you build the SQL/Graph query for db from the selection set, you could identify the pagination values for the field friends and able to specify limit, offset, filtering etc. I am sure there would be other use cases where it would be useful.

Looking forward to your reply, thanks!

0xSalman avatar Mar 29 '18 12:03 0xSalman

Hey, thanks for the contribution!

I'm still getting up to speed with the implementation details of execution but I'd like to get this feature in. I believe we have a similar PR in #70.

It'd be nice to include some unit tests to cover the new behavior, and perhaps a benchmark to get an idea of how much additional overhead the feature adds (if any).

tonyghita avatar Apr 09 '18 21:04 tonyghita

@salmana1 For including it into the context: i'd not do it for three reasons: (1) the optional parameter allows constructing the tree only when the method requests it by specifying the optional parameter; with the context solution we'd have to either always construct it or pass some factory function that constructs it (which is a bit ugly IMO) (2) the optional parameter solution ensures the runtime and parsing performance for existing code does not change (3) IMO passing whatever as a context parameter under some key is not "backward compatible", whatever key name is chosen, it still might collide with someone using it in the existing code.

For variables in subselections - will add it to the patch.

@tonyghita - will add tests (prob. would be nice to provide an example/update docs as well); for benchmarking - i expect it not to change the runtime performance of the existing 1- or 2-parameter methods - the difference is one "if" branch; same about memory use - Go booleans are packed AFAIK; but might be useful to show how adding this optional parameter affects the GQL method call speed.

xmirya avatar Apr 09 '18 21:04 xmirya

@xmirya

Thanks for replying back and clarifying. After commenting here, I thought about the context approach again and came up with same conclusion as yourself. I am liking the optional argument approach more now. The patch with variables would be nice. If you haven't started working on the patch yet, I can create a PR.

@tonyghita

It would be awesome if you could please merge this PR soon. I think we can follow up with another PR to add unit tests and benchmark results.

0xSalman avatar Apr 10 '18 00:04 0xSalman

I left some comments on this subject here: https://github.com/graph-gophers/graphql-go/pull/70#issuecomment-379957784

As I mentioned there, I believe that the context is the idiomatic place to put these fields even if it requires a constructor function. We could also used a typed key for the context variable so that there is no possible collision.

That said, I don't feel so strongly about this as to prevent this PR from being merged if the general consensus is that an extra argument is preferred to using the context.

nicksrandall avatar Apr 10 '18 03:04 nicksrandall

I think people will use selection set more often than not. Perhaps we could achieve best of both worlds by introducing a configurable option via SchemaOpt. Construct the selection set tree and put it in the context only when it is configured. This approach will make sure that there is no performance impact to existing code and there is an idiomatic solution.

What do you guys think?

0xSalman avatar Apr 10 '18 10:04 0xSalman

I haven't experienced a need for the selection set in resolvers, but I can see how this may be important if your resolvers directly communicated with a database.

Is anyone able to provide an example of how they would use this functionality in their resolver code? One or more concrete code examples could help guide this discussion.

tonyghita avatar Apr 10 '18 16:04 tonyghita

My situation (and I know I'm just repeating what's already been mentioned here) : I have semi complex relation between user (owner) and a couple of other entity types (campaigns, assets, content) + spme relation between these entities. All this is stored in the rdbms. Atm I have two options: 1) preload all relations down to Nth level or 2) load relations (query the db) of each node.

None of those 2 is optimal. If I would be able to get the (whole tree) of selected fields at least in the 1st level resolver ( but preferably in any resolver) I could make a smart preloader of relations that would load just enough.

I have already everything in place, all I need now is this tree of selectors :)

I would prefer (just a personal preference) it is passed by argument directly to the resolver func (optional if detected by reflection).

darh avatar Apr 10 '18 17:04 darh

@tonyghita

We are using DGraph as a database and having the selection set helps us convert a GraphQL query into a DGraph query. Or at least know which fields/nodes we need to pull from the DB. We do not want to pull all the nodes & reverse nodes and then figure out what to send back to client.

May I ask, how are you interacting with db? Are you pulling everything from DB and filtering what to send back to client in the app?

0xSalman avatar Apr 10 '18 17:04 0xSalman

@salmana1 We don't interact with any databases directly—instead we make service requests to hundreds of data services which front databases.

The databases are denormalized (generally joins happen in the application layer). We haven't really had to care if we are over-fetching from services since it all happens within the same network, so generally the costs of over-fetching at this level are neglible.

This is the approach I've tried to demonstrate in https://github.com/tonyghita/graphql-go-example

tonyghita avatar Apr 10 '18 18:04 tonyghita

Hi,

@tonyghita My use case (currently we are using apollo-server with nodejs, providing the possibility to get tree field on the resolver level). We are creating an application for displaying TV programs and associated videos.

The (simplified) schema :

type Video {
  id: ID!,
  title: String!
  duration: Int!,
  img: String!
}

type VideosConnector {
  offset: Int!,
  hasNext: Boolean,
  total: Int!,
  items: [Video!],
}

type Program {
  id: ID!,
  name: String!,
  category: String!
  img: String!,
  videos(offset: Int = 0, limit: Int = 20): VideoConnectors
}

type ProgramConnector {
  offset: Int!,
  hasNext: Boolean,
  total: Int!,
  items: [Program!], 
}

type Query {
  getLastPrograms(offset: Int = 0, limit: Int = 20): ProgramConnector,
  getProgramsByCategory(category: String!, offset: Int = 0, limit: Int = 20): ProgramConnector,
  searchPrograms(search: String!, offset: Int = 0, limit: Int = 20): ProgramConnector,
  getProgramById(id: ID!): Program,
}

Screens : The homepage of the app displays the last programs, and a selection of programs ordered by categories. For each programs the app display the program name, his picture and the total of available videos. The request looks like that :

fragment Prg on Program {
  id name category img
  videos { total }
}

{
  getLastPrograms {
    total offset hasNext
    items { ...Prg }
  }
  series : getProgramsByCategory(category:"series") {
    total offset hasNext
    items { ...Prg }
  }
  tv_show : getProgramsByCategory(category:"tv_show") {
    total offset hasNext
    items { ...Prg }
  }
  news : getProgramsByCategory(category:"news") {
    total offset hasNext
    items { ...Prg }
  }
}

When an user select a program the app displays a program page with all the associated videos. The request looks like that :

{
  getProgramById(id:"1234-5678-01234") {
    id name category img
    videos {
       total offset hasNext
       items {
          id title img duration
       }
    }
  }
}

Resolvers : In these requests, the app asks for VideoConnector node. In the first case, just to get the total of videos associated to a program, in the second to get the all the videos.

What we have done with apollo-graphql : In the VideoConnector resolver, if the items node is not requested, we use a dataloader (executing a count request) to get all the video counters at once. If items node is requested, we execute a select request for the program in the video table to retrieve all the informations of videos.

Using a dataloader and count request in the first case is more efficient. It's why we need to have access to the tree of field to achieve these kinds of optimization.

I hope my use case will help you !

ldechoux avatar Apr 18 '18 09:04 ldechoux

So, where are we with this? :) Planning to push my API in (pre)production mid-May and I would really like to have this support built in.. ty.

darh avatar Apr 22 '18 09:04 darh

Any update?

lubo avatar Jun 05 '18 14:06 lubo

+1

karlos1337 avatar Jun 22 '18 11:06 karlos1337

In my opinion, this is the number one feature missing from graphql-go. If I have highly-interconnected data, how do I draw the line of what to preload and what not to preload? I can't preload everything—that graph is infinite. And I don't want to generate dynamic queries for every nested object, as that could potentially lead to hundreds of extra queries. I need to be able to see the subtree of fields that this resolver and its children care about in order to make intelligent optimization decisions.

ghost avatar Jun 22 '18 21:06 ghost

Any update? I rly need this feature

OoXoSoO avatar Jun 29 '18 09:06 OoXoSoO

Any update? This is indeed a make-or-break feature for me as well- I can't get any semblance of good performance without this.

abradley2 avatar Aug 09 '18 23:08 abradley2

Just ran into this issue ourselves, found it strange that it wasn't already in the lib. But great that it's close to merge, ship it 🚀 :)

avocade avatar Oct 05 '18 10:10 avocade

This is an important and repeatedly requested feature. Is there anything specifically preventing this PR from being merged? Anything we can help?

lfv89 avatar Jan 22 '19 16:01 lfv89

@nicksrandall @pavelnikolov @tonyghita ?

choonkeat avatar Jan 23 '19 01:01 choonkeat

This would be so good for the library :) . Are there any news regarding this PR?

gragera avatar Apr 25 '19 08:04 gragera

👍 Definitely would like to see this get implemented, it would make queries from resolvers far more efficient in some use cases.

Having it be optional will allow certain upstream resolvers still follow natural convention, but allow DB queries to be a lot faster.

Note

I think, should this get implemented, it would be highly worthwhile calling out certain traps one could get caught in by relying on this (especially if they want to use dataloader).

For example:

query {
    authors {
        id
        name
        books {
            id
            name
            relatedBooks {
                id
                name
                isbn
            }
        }
    }
}

The major benefit to this query IMHO would not be the fact that I need not blindly SELECT * from authors, but that I could under-the-hood JOIN on the books.id. Assuming I'm using dataloader in my resolver for Books, my Authors resolver would be able to prime the books dataloader automatically and I could reduce my needed queries from 2 -> 1, able to prime my cache of authors & books simultaneously.

But there'd be a small gotcha for someone who might naively try to over-optimize, such as the case where they granularly want to select (instead of the *). For example, the query above may want to run SELECT id, name, relatedBookIds from books in that join (forgetting that the nested relatedBooks is also fetching for isbn). The cache would then be missing the isbn field in the queried books, which can result in some... odd situations (either needing to re-fetch which is inefficient, or return an invalid value/error). I think this could be an easy pitfall for someone to fall into if this feature were to be implemented, and should be provided as a word of caution somewhere in the documentation.

ihgann avatar May 01 '19 16:05 ihgann

Is there any way to get this merged soon?

PatrickBuTaxdoo avatar Aug 01 '19 08:08 PatrickBuTaxdoo