router
router copied to clipboard
Remove legacy validation
Fix #4407 Fix #4409 This removes GraphQL validation from the query planner, to use the Rust version instead. Validation has now moved to the query analysis layer, which means we can remove a lot of code that was there to accommodate parsing and validating the query but not rejecting it outright, because we needed to compare the validation result with the planner's.
Remaining issues:
- [x] usage reporting: since invalid queries are rejected in query analysis, inside the router service, then usage reporting for invalid queries is no longer reported, because the telemetry plugins handles usage reporting at the supergraph response level
- [x] persisted queries: apparently persisted queries use the query analysis transformation of supergraph requests, now one test fails on an invalid query, I have not yet looked at why
- [x]
@defer
directive: a test is failing because the directive is used in the query but not found by the schema. @lrlna what is apollo-compiler expecting here? Is@defer
considered builtin, or does it need to be declared in the schema?
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
- [ ] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
- [ ] Unit Tests
- [ ] Integration Tests
- [ ] Manual Tests
Exceptions
Note any exceptions here
Notes
[^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.
@Geal, please consider creating a changeset entry in /.changesets/
. These instructions describe the process and tooling.
CI performance tests
- [ ] events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
- [ ] events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
- [ ] events - Stress test for events with a lot of users and deduplication ENABLED
- [ ] large-request - Stress test with a 1 MB request payload
- [x] step - Basic stress test that steps up the number of users over time
- [ ] xlarge-request - Stress test with 10 MB request payload
- [ ] reload - Reload test over a long period of time at a constant rate of users
- [ ] no-graphos - Basic stress test, no GraphOS.
- [ ] events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
- [ ] xxlarge-request - Stress test with 100 MB request payload
- [ ] step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
- [ ] events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
- [x] const - Basic stress test that runs with a constant number of users
- [ ] events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
@defer directive: a test is failing because the directive is used in the query but not found by the schema. @lrlna what is apollo-compiler expecting here? Is @defer considered builtin, or does it need to be declared in the schema?
@Geal we do not define it as a built-in in apollo-compiler. The supergraphs that support defer, will have a defer definition already defined. And it looked like a bunch of tests [1] [2] [3] in the router already had them defined as part of the schema. I assume this test also needs it defined.
alright, that's an easy fix then
As an FYI, I know @goto-bus-stop had a change they need to run in our dual mode for a couple of days before we strip this out completely.
alright, I'm holding off on merging it for now
@lrlna @goto-bus-stop is it ok to merge now?
We still need to approve & release the apollo-compiler validation fix and get it out in a public router release (via https://github.com/apollographql/router/pull/4510, just a patch version), and then let it bake for a few days and make sure the metrics are 100%. Simon would be the best to review it but he's out today, I expect we'll have to wait a week before we can remove legacy validation entirely
updated to the latest dev now that #4510 is merged
@goto-bus-stop @SimonSapin @lrlna is there anything still blocking this?
@Geal let me take a look at the metrics real quick
@geal we haven't had any hiccups in the metrics since 1.40.1
released on monday. @goto-bus-stop the harness was matching with both implementations, yea?
I am not sure if we have metrics from the new version yet specifically from the operations that were affected before (it's at least not in the nightly analytics rollup data yet). If we do, then we are good to go
The option should also be removed from documentation: https://github.com/apollographql/router/blob/23656eac8beba4458f3e994538cf297403edc73c/docs/source/configuration/overview.mdx#L801-L811
@goto-bus-stop nice catch, thanks 746af65
Marked as approved to mean I don’t need to approve again after this change, but I do think the parse error thing should be changed.
We are using a plugin defining a supergraph_service
. Before this change, this method was called, we could do some request validation in it. After this change, it seems that this method is not called when the GraphQL query is invalid.
Can you confirm that this PR could change this behavior? Is it intended?
@yanns What was the request validation that you were doing at the supergraph service? Something GraphQL-specific?
@yanns What was the request validation that you were doing at the supergraph service? Something GraphQL-specific?
There are 2 use-cases:
- (1) as we have a public GraphQL API, we need to translate some error code. So we need a plugin that translates the error codes emitted by the router to "our" error codes. If the router rejects invalid GraphQL queries without calling plugins, we cannot modify the response.
- (2) we are using feature toggles. One GraphQL query does not use exactly the same GraphQL schema as another GraphQL query. So we need to validate requests depending on the GraphQL schema that is valid for this particular request. For this, I don't think that this PR is an issue. We configure the router with a GraphQL schema that has all possible fields. After the router has accepted the query, we validate it again with another GraphQL that is a subset of the one used by the router.
So I guess our main issue is that our plugin cannot adapt the response in case of invalid GraphQL queries.
Helllo Team,
Really appreciate the work you guys are doing.
I have a use-case where I don't want any basic validation on router level. For that I did some changes in apollo-compiler and router to ignore validation error on rust , and for the most part it is working.
But for some queries, fragment types in particular are still giving validation error.
After exploring I have found out that these validation checks are coming from js (legacy).
I even tried using experimental_graphql_validation_mode: new
in my yaml, but still it persists and validation errors are coming from Legacy.
To reproduce Query is:
query getCall1($cursor: String, $size: Int) {
call1: getCall1(cursor: $cursor, size: $size) {
results {
id
payload {
...Payload1
__typename
}
sender
__typename
}
__typename
}
}
fragment Payload1 on Payload1 {
field1
type
...Type1
...Type2
__typename
}
fragment Type1 on Type1 {
field2
__typename
}
fragment Type2 on Type2 {
field3
__typename
}
and say superschema doesn't have Type2 info.
The error I am getting is
{
"errors": [
{
"message": "Cannot find fragment name \"Type2\" (provided fragments are: [Type1, Type3])",
"extensions": {
"code": "GRAPHQL_VALIDATION_FAILED"
}
}
]
}
Any help is appreciated, thank you.
@sushant3524 Accepting invalid queries in the router may cause all kind of unsupported glitches in the router and in the query planner. The router does not do validation in JS but I think the query planner may validate its own outputs at the end of query planning. Subgraphs also usually validate queries and if they return an error the router will propagate it. Everything the router does relies on the query being valid so trying to pass through something invalid is asking for trouble.
Thanks @goto-bus-stop for the quick revert. Have reviewed much of your code in the router repository and it is a pleasure to finally engage with you. I understand and accept the risks and will very well work on it. I just want that router doesn't validate through JS and as this PR removes it, I am not sure how and why JS is validating.
Is it because of #4964 ?
Or is it because router uses JS for some other usecase like maybe query_planner and there in JS if the query looks fishy they validate it again?
after searching I have found out that this is coming in bridge_query_planner from PlannerMode
.
Are we working on removing dependency from JS for it?