lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Select only necessary columns for optimized queries

Open kplgr opened this issue 4 years ago • 28 comments

What problem does this feature proposal attempt to solve? I am trying out GraphQL, testing various features in order to include it in some of the web apps I am developing with laravel. Being curious, I created a Schema and fired up all sorts of queries, followed by poking around in the SQL server to see what was going on.

What I discovered was this:

Suppose I have a Type defined which corresponds to a Model, and the database table has an ID column and 26 more columns (A, B C, ..., Z).

No matter what the Type definition is, which in my case only includes some of the columns ( say, ID and A, B, C), I can see that the SQL server is asked to fetch all 27 columns from the DB, using a SELECT * FROM [Table] statement, even though given the definition of the type at most 4 of them would be returned via the resolver.

This is an unnecessary data transfer between the DB and PHP, in my opinion.

Which possible solutions should be considered?

I propose that we construct a more refined SQL query, selecting only the columns that are requested in the GraphQL query.

I am not sure I am competent enough to try and implement this on my own, but I will give it a shot. In any case, I thought I'd share my thoughts.

kplgr avatar May 10 '20 10:05 kplgr

Technically, this can be done by constraining the SELECT to the requested fields. The fourth resolver argument ResolveInfo can be used to know which fields a client selected.

There are some pitfalls to this which require discussion:

  • Fields may not match database columns 1-1
  • Laravel assumes SELECT * by default for things like relations, virtual properties, etc.

spawnia avatar May 10 '20 11:05 spawnia

Technically, this can be done by constraining the SELECT to the requested fields. The fourth resolver argument ResolveInfo can be used to know which fields a client selected.

There are some pitfalls to this which require discussion:

* Fields may not match database columns 1-1

* Laravel assumes `SELECT *` by default for things like relations, virtual properties, etc.

Would you be so awesome as to provide an example? I've searched through the docs but I can't seem to find what I am looking for :-(

Let's say I have a model Nerd with the fields glasses and beard,

type Nerd {
    glasses: Int!
    beard: String!
}

and I want to query them with

type Query {
   nerds: [Nerd] @all
}

and I only want the SQL statement to be SELECT glasses, beard FROM nerds. Thanks in advance!

mazedlx avatar May 28 '20 12:05 mazedlx

public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

spawnia avatar May 28 '20 12:05 spawnia

public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

Thanks! Just one more stupid question: where do I put that method? On the Eloquent model?

mazedlx avatar May 28 '20 12:05 mazedlx

https://lighthouse-php.com/master/the-basics/fields.html

spawnia avatar May 28 '20 12:05 spawnia

Awesome, thanks!

mazedlx avatar May 28 '20 12:05 mazedlx

This kind of feature would be a huge performance improvement for the package. I'm implementing an endpoint for multi million record table with at least 50+ columns. I'm using query builder to make search in a given bounding box (https://wiki.openstreetmap.org/wiki/Bounding_Box) in PostgreSQL also implemented complex where conditions to filter the results. Only issue is Laravel generates SELECT * queries which affects the performance. I could use custom resolvers but when I do that I have to deal with pagination.

Following is my current Query schema:

listings(bbox: BoundingBoxInput! @builder(method: "App\\GraphQL\\Builders\\BoundingBoxSearch@bbSearch"), where: _ @whereConditions(columnsEnum: "ListingColumn")) : [Listing!]! @middleware(checks: ["auth:api"]) @paginate(maxCount: 50 defaultCount: 10)

Is there any way I can modify select columns without writing custom resolver ? Because by this way with very little code I can do so much.

canatufkansu avatar Jun 22 '20 13:06 canatufkansu

I am not planning to implement such functionality in Lighthouse myself, but am open to accepting PRs.

spawnia avatar Jun 22 '20 13:06 spawnia

I am not planning to implement such functionality in Lighthouse myself, but am open to accepting PRs.

I understand the complexity and side affects like you mentioned above about relations and virtual properties. It would be nice to have some way to control select columns like in query builder @builder directive. Thanks, even like this Lighthouse is very useful.

canatufkansu avatar Jun 22 '20 15:06 canatufkansu

Just noticed this is a duplicate of https://github.com/nuwave/lighthouse/issues/1214. An implementation attempt was also started https://github.com/nuwave/lighthouse/pull/1253

spawnia avatar Jun 22 '20 21:06 spawnia

Can't quite see how this is duplicate.

Could this be solved by implementing a select directive?

class User extends Model
{
  public function getNameAttribute()
  {
    return $this->first_name . ' ' . $this->last_name;
  }
}
type User {
  id: ID!
  first_name: String
  last_name: String
  name: String @select(fields: ["first_name", "last_name"])
}

andershagbard avatar Jul 28 '20 12:07 andershagbard

@andershagbard right, there is some overlap between the issue, but the intent is different.

Could this be solved by implementing a select directive?

That would be one part of the puzzle, good thinking.

We might need a mechanism to make the optimized selects opt-in per field / type. Otherwise, it could be quite hard to migrate existing applications towards it, since now Lighthouse always selects all the fields.

spawnia avatar Jul 28 '20 13:07 spawnia

We might need a mechanism to make the optimized selects opt-in per field / type. Otherwise, it could be quite hard to migrate existing applications towards it, since now Lighthouse always selects all the fields.

Trying to think of a good solution for this.

Could this be an opt-in in the config file? If enabled it would use the field(s) in the @select, if @select is absent, use the field name.

andershagbard avatar Jul 28 '20 13:07 andershagbard

We might do that as an MVP. I am a bit worried about migrating big applications towards optimized selects, but maybe it is not a big issue in practice.

I do not plan to implement such a feature myself, but am happy to aid in reviewing and refining a PR.

spawnia avatar Jul 28 '20 13:07 spawnia

@spawnia I am working on an implementation for this, and I am having trouble accessing child directives via $resolveInfo. Accessing the directives property on any selection in a selectionSet returns an empty NodeList.

DDynamic avatar Nov 21 '20 02:11 DDynamic

@spawnia I am working on an implementation for this, and I am having trouble accessing child directives via $resolveInfo. Accessing the directives property on any selection in a selectionSet returns an empty NodeList.

That would give you access to client directives.

This feature requires looking at schema directives. Look at other Lighthouse directives to see how that works.

spawnia avatar Nov 23 '20 13:11 spawnia

Could you give me an example of a parent directive searching through child directives? I can't think of any off the top of my head.

DDynamic avatar Nov 23 '20 19:11 DDynamic

I do not know what you mean by parent or child directives. Consider the FieldValue class and its uses, maybe you will find what you need there.

Try posting a draft PR and we can discuss implementation details.

spawnia avatar Nov 23 '20 20:11 spawnia

I got to the point that I have created at trait that get the fields that should get selected but I am facing a serious issue with relations I can't apply the select method to belongsTo and hasOne and HasMany directives for some reason is there's any easier way to make the select rule more generic and we can apply it to all directives cause I don't see a way in here

Paula2001 avatar May 29 '21 10:05 Paula2001

Hi all! other thing that I notice is about @count and @paginate:

if in the @count directive we add the columns to include in the count() method, you can choose some "indexed fields", this improve the performance

directive @count(
  """
  .....
  """

  """
  Columns
  """
  columns: [String!]
) on FIELD_DEFINITION
GRAPHQL;

something similar for @paginate might improve the performance, but in this case it's more complicated:

https://github.com/nuwave/lighthouse/blob/master/src/Pagination/PaginationArgs.php#L107

if we modify the below line to include the columns for the count process, the column parameter only affects the return columns and doesn't affect the count that execute the paginator.

return $builder->{$methodName}($this->first, ['row_id'], 'page', $this->page);

image

This is because the Laravel paginate method don't use the columns in the count execution. This is more a question for the Laravel issue queue, but maybe add another parameter in the paginate method to modify the columns in the count process

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Eloquent/Builder.php#L808 https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Query/Builder.php#L2445

dmouse avatar Dec 15 '21 19:12 dmouse

Update: I opened a discussion about the paginate method https://github.com/laravel/framework/discussions/40114

dmouse avatar Dec 20 '21 21:12 dmouse

@dmouse that's just a misconception count(*) is not the same as SELECT *, check this link for more details: https://www.sqlservercentral.com/articles/advice-on-using-count

chadidi avatar Dec 27 '21 20:12 chadidi

public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

Trying to do something similar and I found that brilliant piece you written here!! Though what is the lookahead for? Any issues with using the following?

$resolveInfo->getFieldSelection(100);

It gives the result in the most beautiful output I could have imagined to get, not sure about performance issues...

Stevemoretz avatar Nov 17 '22 12:11 Stevemoretz

$resolveInfo->getFieldSelection(100);

That is absolutely fine, if it has the information you need. lookAhead() gives you more detailed information about the selection, such as the types of the sub-fields and their arguments.

spawnia avatar Nov 18 '22 10:11 spawnia

$resolveInfo->getFieldSelection(100);

That is absolutely fine, if it has the information you need. lookAhead() gives you more detailed information about the selection, such as the types of the sub-fields and their arguments.

Thank you so much for explaining that, so now I know how to access arguments of sub-fields thanks to that method! Just two questions to make my knowledge a little more about this,

  1. can we grab info from a parent-field in the tree from a sub-node?
$_, array $args, GraphQLContext $context, ResolveInfo $resolveInfo

$_ here gives the parent result, but can't access for instance parent args or args from parent of the parent...

  1. can we pass in data from upper-fields nodes to lower ones?

Let's say that I have an array which I got based on the info I was provided in an upper-field now in a sub node (in the same branch) I want to access those data, how could I pass data from upper node to lower nodes in the branch? I thought context was there for this but I was wrong!

Stevemoretz avatar Nov 18 '22 16:11 Stevemoretz

I think those questions are off-topic, you may ask in StackOverflow or Slack.

spawnia avatar Nov 20 '22 19:11 spawnia

Right now, two pull requests are trying to solve this issue:

  • https://github.com/nuwave/lighthouse/pull/1626 is following a relatively simple approach but lacks tests
  • https://github.com/nuwave/lighthouse/pull/2235 tries to do more magic, which I believe to be misguided

Going forward, I think the minimal viable solution for this issue is to implement the manual approach first. A marker directive needs to be added to fields to inform Lighthouse which columns need to be selected for it, e.g.:

"""
Defines which columns must be selected from the database to resolve the field.

If specified on at least one field of a type, Lighthouse will `SELECT` only the explicitly
mentioned columns matching the queried fields. For fields without this directive,
no columns will be selected, so make sure to specify this for all fields that need it.
"""
directive @select(columns: [String!]!) on FIELD_DEFINITION

type User {
  id: ID! @select(columns: ["id"])
  first_name: String! @select(columns: ["first_name"])
  last_name: String! @select(columns: ["last_name"])
  full_name: String! @select(columns: ["first_name", "last_name"])
  rights: [Right!]! @hasMany
}

Only if the fields of a type are explicitly marked with @select will Lighthouse replace the default SELECT * with a constrained SELECT, based upon the fields selected in the query.

{ users { id last_name full_name } }
# SELECT id, last_name, first_name

spawnia avatar Aug 29 '23 09:08 spawnia

@spawnia I agree on your approach on it being an entirely opt-in feature through the @select.

I've been following this for a while as I have one table that has a couple of large blob columns (not the best design but it's a legacy project), without constraining the SELECT the performance is terrible. I've had to fix this through a builder that modifies the query, but some way of doing it as you've shown would be great.

petecoop avatar Oct 31 '23 13:10 petecoop