router icon indicating copy to clipboard operation
router copied to clipboard

Enforce resource limits during query validation: pagination limits, query cost calculations

Open andrew-kolesnikov opened this issue 2 years ago • 9 comments

I would like the ability to enforce pagination limits at the router level, without delegating to subgraphs

Here's an example: query Comments($cursor: String) { Comments(first: 99999999999999999, after: $cursor) { ...

In other words, I am looking to implement a basic version of https://docs.github.com/en/graphql/overview/resource-limitations at the router level

Related discussion in https://github.com/apollographql/router/discussions/1246

andrew-kolesnikov avatar Jun 14 '22 20:06 andrew-kolesnikov

Hi and thanks for raising this issue. I don't think this is a feature which would be available by default in the router. It seems very specialised. I've put some details into the discussion you started.

garypen avatar Jun 15 '22 09:06 garypen

Thank you @garypen for the detailed response in the discussion thread.

When you say it won't be available by default in the router, do you mean any sort of resource limiting such as points-based query cost, quotas or rate limiting won't be implemented at the router level by default?

Are router users then expected to delegate resource control to subgraphs or is there something else to try? (other than writing a non-trivial custom plugin)

andrew-kolesnikov avatar Jun 15 '22 16:06 andrew-kolesnikov

I'm not saying the router won't support any kind of resource limiting, it was more a comment on this exact form of resource limiting. Best practice is still an emerging concept in this area (although, I note that https://ibm.github.io/graphql-specs/cost-spec.html looks interesting) and I think it will take a while to settle down yet.

I do believe there is a role for a resource limiting mechanism in the router at some point in the future.

garypen avatar Jun 16 '22 07:06 garypen

Thank you @garypen for the detailed response in the discussion thread.

When you say it won't be available by default in the router, do you mean any sort of resource limiting such as points-based query cost, quotas or rate limiting won't be implemented at the router level by default?

Are router users then expected to delegate resource control to subgraphs or is there something else to try? (other than writing a non-trivial custom plugin)

Just out of curiosity, how do you expect this to be implemented without implementing a custom plugin in the future? This seems to be a very specialized thing. For my company we handle query validation at the router level ensure that the query the user is sending is a "registered" query (sorta defeats the purpose of graphql in a sense). Then, we do variable validation using graphql directives on the subgraphs.

For example, we have a pagination directive that specifies a maximum page size on specific fields, the validation happens during query analysis stage. For context, we use gqlgen.

LockedThread avatar Jun 19 '22 21:06 LockedThread

@LockedThread:

we have a pagination directive that specifies a maximum page size on specific fields, the validation happens during query analysis stage.

Are you also analyzing in a way that's similar to the style that @andrew-kolesnikov used? (e.g., are you using the Connection specification or something custom?)

If you could provide an example, that would be great. This use case certainly sounds interesting!

abernix avatar Jul 01 '22 11:07 abernix

Adding @lleadbet for visibility. Our current challenge has grew beyond pagination limits - we're looking to port query cost calculations from nodejs - we're looking for something like https://github.com/slicknode/graphql-query-complexity for the Rust router. That is something that I think could benefit a lot of folks too, so any suggestions would be much appreciated.

andrew-kolesnikov avatar Aug 11 '22 15:08 andrew-kolesnikov

Ping @lennyburdette I think he's currently experimenting with a related topic

bnjjj avatar Aug 11 '22 15:08 bnjjj

@LockedThread:

we have a pagination directive that specifies a maximum page size on specific fields, the validation happens during query analysis stage.

Are you also analyzing in a way that's similar to the style that @andrew-kolesnikov used? (e.g., are you using the Connection specification or something custom?)

If you could provide an example, that would be great. This use case certainly sounds interesting!

I apologize for the lack of responsiveness to this, I guess it just slipped through my notifications. Anyways, a public example of this use case in a project I am apart of is this:

Schema Definition: https://github.com/KnightHacks/knighthacks_users/blob/12ce0bb9b608091f6517fe071e11a848908c28db/graph/schema.graphqls#L12

Example Usage: https://github.com/KnightHacks/knighthacks_users/blob/12ce0bb9b608091f6517fe071e11a848908c28db/graph/schema.graphqls#L130

For the actual pagination we use the Connection specification. On one of my previous projects I tweaked the Connection specification to allow for circular pagination along with using this strategy.

We use https://github.com/99designs/gqlgen for our graphql server library.

LockedThread avatar Aug 11 '22 16:08 LockedThread

I just had an idea of how to implement this limit. For every field/query/mutation on the graph there would be cost value associated with it and a max cost defined in the router config. Potentially it could be implemented with directives, I am not sure if the visibility would be great for prospective attackers.

I am sure there are other systems that accomplish this but it hasn't been done with any federated systems because these are all implemented in the graphql server libraries themselves.

Examples

Schema Definition (modified version of this)

directive @cost(
  value: Float,
  calculatedValue: String
) on FIELD_DEFINITION | ARGUMENT_DEFINITION | INPUT_FIELD_DEFINITION | ENUM_VALUE

type ProductVariation {
  id: ID!
}

type ProductDimension {
  size: String
  weight: Float

  # Does math so the cost is higher
  volume: Float @cost(value: 10.0)
}

type Product  {
  id: ID!
  sku: String
  package: String

  # Variation and Dimensions are fields that require another 
  # database call in the subgraph, therefore their cost is higher.
  variation: ProductVariation @cost(value: 2.0)
  dimensions: ProductDimension @cost(value: 2.0)
}

type ProductConnection {
    totalCount: Int
    pageInfo: PageInfo!
    products: [Product!]!
}

type PageInfo {
    startCursor: String!
    endCursor: String!
}

type Query {
  # Uses the calculatedValue directive field which supports math and string interpolation with variables
  products(first: Int!, after: ID): ProductConnection! @cost(calculatedValue: "1.2 * $first")
  product(id: ID!): Product @cost(value: 1.0)
}

Query (Cost: 1.0)

product(id: "abc123") {
    id
    sku
    package
}

Query (Cost: 5.0)

product(id: "abc123") {
    id
    sku
    package
    variation
    dimensions
}

Query (Cost: 264)

products cost: 1.2 * 20 = 24 dimensions cost: 2.0 * 20 = 40 volume cost: 20 * 10 = 200

products(first: 20) {
    id
    sku
    package
    dimensions {
        volume
    }
}

LockedThread avatar Aug 11 '22 16:08 LockedThread

I published https://github.com/apollosolutions/router-basic-operation-cost yesterday as a starting point for demonstrating cost and depth limiting. I’m not at all satisfied with the cost analysis (it doesn’t take lists or abstract types into account) but hopefully it’s somewhat helpful.

lennyburdette avatar Aug 18 '22 00:08 lennyburdette

Does anyone in this thread have any feedback they want to share about @lennyburdette's solution?

abernix avatar Nov 22 '22 13:11 abernix

I have feedback! It's not really a query cost analysis algorithm, it's really just a weighted field counter. Flaws include:

  • it doesn't implement the field collection algorithm so it will count repeated fields (e.g. from different fragments) unnecessarily
  • it doesn't take lists into account, so it will severely undercount cost when fields are resolved repeatedly
  • it doesn't take abstract types into account
  • it doesn't take @skip or @include into account
  • and i'm still waiting for the improvement to apollo-compiler that lets us avoid re-parsing the entire schema each request

lennyburdette avatar Nov 22 '22 15:11 lennyburdette