graphql-go icon indicating copy to clipboard operation
graphql-go copied to clipboard

No way to get requested fields inside resolver

Open auwtch opened this issue 9 years ago • 38 comments

Problem: Inefficient microservice/database queries. If I need to return 100 orders with their clients, I have to make 101 requests to microservices (1 for batch of orders, 100 for their clients (one per each order)).

Scheme (simplified):

type Query {
  orders: [Order]
}

type Order {
  id: ID
  client: Client
}

type Client {
  id: ID
  name: String
}

Query:

query GetOrders {
  orders {
    client {
      name
    }
  }
}

Resolvers:

func (r *Resolver) Orders() []*orderResolver {
	data, err := Request(orderService, "Order.List", &Args{}) // returns batch
	...
	return // batch of 100 order resolvers
}

// 100 client resolvers - 100 requests
func (r *orderResolver) Client() *clientResolver {
	data, err := Request(clientService, "Client.Get",&Args{) // returns one 
	...
	return // a client resolver
}

Instead, I'd like to know (in the order resolver) that the field with clients has been requested. So I could make one batch request at the order resolver level.

Something like this:

func (r *Resolver) Orders(ctx context.Context) []*orderResolver {
	data, err := Request(orderService, "Order.List", &Args{}) // returns batch
	if ctx.Value("fields")... {
		  clients, err := Request(clientService, "Client.List",&Args{) // batch
        }
	...
	return // batch of 100 order resolvers witch clients inside
}

So I get 2 request.

auwtch avatar Nov 11 '16 14:11 auwtch

This is definitely a feature we should think about. Is there anything like that in the graphql-js implementation?

I am a bit worried that this can cause people to write bad resolvers. One nice attribute of GraphQL are all those consistency guarantees it provides.

neelance avatar Nov 13 '16 21:11 neelance

I agree that this could cause people to write bad code resolvers, but it could also help write efficient resolvers. For example, one could use that information to prevent an unnecessary "join" in his/her db query if the user didn't ask for any of those fields. Also, it could give user the ability to log which fields are being requested most frequently and then perform optimizations on those fields (like pulling from redis-cache rather than db).

The last variable passed to graphql resolvers in graphql-js has lots of info about the schema, fields, ect...

see: https://github.com/graphql/graphql-js/blob/master/src/type/definition.js#L489

nicksrandall avatar Nov 15 '16 19:11 nicksrandall

@nicksrandall Do you know any example code that uses this feature with graphql-js?

neelance avatar Nov 15 '16 22:11 neelance

@DenisNeustroev Facebook has a recommendation on how to solve the n+1 problem with a technique called DataLoader. I have taken a stab at implementing this technique in go here: https://github.com/nicksrandall/dataloader.

@neelance I am not aware of any examples of people using the schema info in a resolver but I haven't done much research on the subject.

nicksrandall avatar Nov 29 '16 17:11 nicksrandall

Being able to see what fields were requested would make it super easy to select a specific list of columns when performing an SQL query in a resolver. That means, we don't have to get all fields (SELECT * FROM) which could be more efficient.

F21 avatar Dec 13 '16 04:12 F21

I know context can be a divisive topic in go, but reading the graphql spec had me wondering: if a resolver gets 3 args (i.e. obj, args, context), could a server implementation stuff the selection set in there? Off the top of my head I was thinking if the context knew the selection set and knew what in the selection set had already been resolved, it could save on extra requests

mvpmvh avatar Jan 16 '17 16:01 mvpmvh

I have a similar issue, need a way to either defer the resolution so I can produce a query, or some way to access the nested fields of the query to properly form the query instead of simply filtering results a very expensive query (or possibly unknown, in which case it's impossible). Anyone come up with a workaround?

EDIT: I ended up hacking it in. I think ideally there could be an optional preparation pass to the set of resolvers, or some similar set of structs, letting you construct the queries necessary first. Some variant of that haha.

tj avatar Mar 21 '17 19:03 tj

@tj I took at stab at implementing this in #70 . Would you mind sharing what you did as a work around?

nicksrandall avatar Mar 21 '17 23:03 nicksrandall

I'm still not a fan of exposing all this data to the upper resolvers. They should not care about that.

However, the initial example by @DenisNeustroev definitely shows a need. What about instead of integrating that clients, err := Request(clientService, "Client.List",&Args{...}) into the Orders resolver, we instead add exactly that, a way to process a batch of Client resolvers in one go?

neelance avatar Mar 21 '17 23:03 neelance

@neelance what do you mean by "upper resolvers"? I think it is reasonable for a resolver to be aware of it's children and thus aware of the fields requested for itself and it's children.

Also, I like the idea of baking in the ability to execute a batch of resolvers in one go. I've been doing a form of that with my dataloader implementation and it has been very helpful for the performance of my service.

nicksrandall avatar Mar 22 '17 02:03 nicksrandall

It's nearly impossible to optimize a query without that information. For example my current use-case is to fetch a series of rows from Postgres, then I have to construct a query for Elasticsearch based on those, however I have to know what metrics the user is asking for, otherwise it's simply impossible to create the query.

checks() {
  name
  url
  metrics {
    time_total {
      min
      max
      avg
    }

    errors {
      sum
    }
  }
}

I'm new to GQL so maybe this is a weird approach, you could maybe do it by passing arguments but it would be super ugly to the point where I wouldn't want to use GQL for it anyway.

If this is an anti-pattern definitely let me know haha, but it seems logical to me. The main thing I like about GQL is this expressiveness vs passing metrics(check_id: ID, select: ['min', 'max']) or whatever that would look like, which also wouldn't work in my case due to nesting.

tj avatar Mar 22 '17 15:03 tj

@tj How much performance difference does it make in your case to fetch some fields vs. all fields? If all fields come from the same table (no JOIN), then the difference might be so small that it would be premature optimization.

neelance avatar Mar 22 '17 15:03 neelance

In my case it would be hugely detrimental, I'd have to fetch every possible permutation of what they can ask for and then filter. They're pretty expensive aggregate queries on large amounts of data as well so it's not like a quick select from Postgres, more like going from 200ms queries to 5s+.

For the SQL use-case I totally agree that selecting specific columns is usually not a huge deal, I almost always select * those anyway.

tj avatar Mar 22 '17 15:03 tj

All right. For that situation you'll probably want to use a custom resolver. That feature will be added soon, I'm just currently busy with adding validations.

neelance avatar Mar 22 '17 15:03 neelance

Cool cool no worries! The context thing will keep me covered for now, just passing the fields to a template to generate the massive Elasticsearch query haha.

tj avatar Mar 22 '17 17:03 tj

Ahh hitting cases where custom resolver would be the only way to go now. Doing something like this to provide a histogram(field: ..., interval: ...):

"{{or .Alias .Name}}": {
  "histogram": {
    "field": "result.{{.Args.stat | normalize_stat}}",
    "interval": {{.Args.interval}}
  }
}

which can't be properly mapped unless you have access to the name or alias anyway.

tj avatar Mar 23 '17 18:03 tj

What about something like https://github.com/neelance/graphql-go/pull/169 ?

xmirya avatar Feb 25 '18 23:02 xmirya

Having an issue with this as well, I'm eating the performance hit fine in my case though. The bigger issue is not being able to get query arguments in resolvers nested all the way down. I'm looking at the solution here for selectedFields (https://github.com/DealTap/graphql-go/pull/5) and hacking that in to include arguments as well

abradley2 avatar Jul 22 '18 15:07 abradley2

Now that we have two alternative solutions for this issue I decided to give my two cents about it.

  1. Adding one more optional parameters to resolver methods (#169). In general go doesn't have optional parameters (excluding the case with final variadic parameter), but since this package dynamically resolves methods by name the idea with optional parameters already works great (the first optional context.Context one). The main argument against seems to be "we can't just keep on adding optional parameters for everything we think of". This makes a lot of sense, however could anybody think of something else fundamental that could require an optional parameter? If there are really several other feature requests that makes sense to get fixed with yet another optional parameter, then let's really avoid this solution.

  2. Introducing a function which extracts selected field from the context (DealTap#5). To be fair this is one of the few good reason to use context values for as stated in package's docs:

Use context Values only for request-scoped data that transits processes and APIs, not for passing optional parameters to functions.

However, I argue that this is a good idea giving that Go doesn't have an idiomatic way to pass optional parameters unless dynamic method dispatch is in place. Think of what the user's code will look like? In case 1 the optional parameter is added only where needed with already populated and ready-to-use value. Nothing else get changed. In case 2 instead there will be one repetitive block with something like:

selectedFields, err := selected.GetFieldsFromContext(ctx)
if err != nil {
	// for some reason the context value is missing. We know that one can't
	// send a query with zero selected fields, then we must assume all
	// fields are selected... I guess.
	selectedFields = somehowGetAllFieldsForThisQueryOrHardCodeThem()
}

// Here selectedFields is just what would've been anyways with an optional
// parameter.

The unfortunate part is that there's really no reliable and safe way to make selected.GetFieldsFromContext deduce all possible fields for this query and simply return them without ever giving an error.


TL;DR: I think the solution proposed in #169 is the right way to resolve this issue and should go towards getting it ready to be merged and eventually merging it.

vladimiroff avatar Jul 30 '18 09:07 vladimiroff

@DenisNeustroev since in your case the upstream resolver (i.e. Orders) knows how to construct the batch query for the downstream resolver (i.e. Client) you can still get only two requests (at most) with the current implementation. See the following pseudo-code:

func newClientGetter(ids []string) func(id string) (*client, error) {
	var once sync.Once
	var data interface{}
	var err error

	// Lazy loads the batch when called
	return func(id string) (*client, error) {
                 // will only ever fire one request
		once.Do(func() {
			// batch
			data, err = Request(clientService, "Client.List", &Args{})
		})
		if err != nil {
			return nil, err
		}

                 // clientFromData(interface{}, string) (*client, error)
		return clientFromData(data, id)
	}
}

func (r *Resolver) Orders() []*orderResolver {
	data, err := Request(orderService, "Order.List", &Args{}) // returns batch

	// Assuming you can get the client IDs from data
        // idsFromData(interface{}) []string
	clientGetter := newClientGetter(idsFromData(data))

         ...

	return // batch of 100 order resolvers witch clientGetter inside
}

// 100 client resolvers - 1 request
func (r *orderResolver) Client() *clientResolver {
	data, err := r.clientGetter(r.clientID)

         ...

	return // a client resolver
}

If N > 0 clients are needed only one Request(clientService, "Client.List", &Args{}) will be fired. If 0 clients are needed no Request(clientService, "Client.List", &Args{}) is ever fired.

I've had success with this approach every time there's a nested batch query where an upstream resolver has enough info to pre-build the query. Hopefully this can help you or @abradley2.

Thoughts?

matiasanaya avatar Nov 16 '18 07:11 matiasanaya

I've found a way around this without having to change the lib itself by creating another lib just for this purpose: gbaptista/requested-fields

I don't like the idea of having to parse the query a second time or need to manually set the hierarchy of resolvers, but it was a simple and quick solution that works.

gbaptista avatar Feb 10 '19 22:02 gbaptista

A use case that is completely missed here is the difference between doing SELECT * and SELECT some_field, another_field can be realized when a row has a particularly heavy blob attached to it. This reason alone can make a dramatic performance improvement.

Additionally, this is common enough to use this information in both the Javascript implementation as well as the Ruby implementation (I've done it myself). With other implementations, they simply provide the AST to the resolver from the point in which the graph is currently being walked. It basically came for free in the other implementations.

The ruby implementation even goes as far as allowing plugins to be able to make changes to the AST by using a copy-and-modify technique to keep immutability of the AST. This allows them to implement custom directives as a form of AST manipulation. https://github.com/rmosolgo/graphql-ruby/blob/master/guides/language_tools/visitor.md

mgwidmann avatar Mar 26 '19 15:03 mgwidmann

I need to decide what to do within resolver based on fields requested by client, how to get the fields list requested by user? why not created one struct with all parameter related to request, including parents fields and inject as parameter on resolver? any update about this? I'd like to use this lib but seems is not maturity enough for production use which reach real situations, is it?

rodrigorodriguescosta avatar Sep 09 '19 03:09 rodrigorodriguescosta

I started using this lib a few months ago, but now we're running into this limitation. It's unfortunate. I think I'm going to have to rip it out, which is going to be a bunch of work.

jeanpi avatar Feb 09 '20 18:02 jeanpi

We need this too. We are looking to do field usage reporting and metrics in addition to tracing. We'd prefer not to spread the monitoring logic throughout the application. There doesn't seem to be a feasible way to do this currently.

Jmoore1127 avatar Mar 25 '20 21:03 Jmoore1127

I've had #373 open for a few months as a possible solution to this issue. Would someone mind taking a look?

keithmattix avatar Jul 27 '20 15:07 keithmattix

#373 seems a very non-intrusive implementation and not really risky. The problem to be solved is pretty urgent though. Every time I read that one of the great things about graphql is that you only get what you ask for, I die a little, as the problem is still there at api-implementation which doesn't help at all. Can I do something to get a solution in? I've looked at #373. Is this the one I should be checking in production and give an OK (or not) on?

jhelberg avatar Dec 21 '20 13:12 jhelberg

Additional vote here for #373. Using the context argument makes it backwards compatible and opt-in, so I think this is the best approach of the ones I've seen submitted.

andrewmostello avatar Dec 31 '20 16:12 andrewmostello

I've closed #373 in favor of #428 which is off of my fork's master branch. Hopefully, if there's any other work maintainers would like to see first, please let me know :)

keithmattix avatar Jan 02 '21 23:01 keithmattix

Hi all. Where do we stand on this today?

I love the way this library works, but I have use cases that necessitate an "eject" button to resolve selected fields from another GraphQL instance (different shard). (My case closely matches this comment.)

From what I've found, these are all the proposals and their statuses:

  • https://github.com/graph-gophers/graphql-go/pull/70 – The earliest proposal (2017). Injects the raw fields from exec into the context. Stalled, pointed to 169 and referred to dataloader.
  • https://github.com/graph-gophers/graphql-go/pull/169 – Follow-up proposal to 70. Great discussions and presentations of various use cases supporting this idea. Provides inner selected fields as optional 3rd parameter to a field resolver. Notably missing the current node's info, which spawned 422. Recently active comments from others, seems to be waiting on tests.
  • https://github.com/graph-gophers/graphql-go/pull/422 – Embeds the current field, including it's inner selected fields. Uses the context. Conversation stalled.
  • https://github.com/graph-gophers/graphql-go/pull/373, https://github.com/graph-gophers/graphql-go/pull/428 – Very similar to 169, though selected fields placed into the context instead of the optional 3rd parameter. 373 closed in favor of 428, 428 closed in favor of other PRs.
  • https://github.com/DealTap/graphql-go/pull/5 – A fork and it's PR, merged using a context approach.

Edit: I thought I should add: This missing feature is looking like a deal-breaker for my team. Without this flexibility, we hit issues creating a global GraphQL service with multiple, regional database shards. A dataloader helps, but even 2 serial round-trip requests to another region is quite expensive time-wise.

JohnStarich avatar Oct 13 '21 03:10 JohnStarich