router icon indicating copy to clipboard operation
router copied to clipboard

API 1.0 - QueryPlan has no way to propagate errors.

Open BrynCooke opened this issue 2 years ago • 5 comments

Unlike the rest of the data-structures QueryPlannerResponse doesn't have an errors field. This is problematic as the error handling mechanism in tower is only really suitable for unrecoverable pipeline errors.

  • [ ] Add Response to QueryPlanner
  • [ ] Ensure that errors are propagated through the chain of responses

BrynCooke avatar May 24 '22 11:05 BrynCooke

I can have a first look tomorrow. Should errors from the query planner end up in the errors JSON key of GraphQL responses? Do we want the same graphql::Response type as everywhere else?

SimonSapin avatar Jun 28 '22 16:06 SimonSapin

I started at QueryPlannerResponse::error_new which used to ignore its errors parameter and emit a warning. Then propagated those errors values as necessary, and arrived at PR https://github.com/apollographql/router/pull/1358.

It’s only after that that I noticed QueryPlannerResponse::error_new doesn’t have any caller. Maybe it’s intended for plugin authors?

Are there any cases today where the query planner emits errors? (That may not yet be correctly propagated.) Where in the code does that happen?

SimonSapin avatar Jul 05 '22 16:07 SimonSapin

https://github.com/apollographql/router/blob/b6a26397f811f526dbd32e3d6852bf8391abf297/apollo-router/src/query_planner/caching_query_planner.rs#L120-L152

This is where a CacheResolverError (enum whose only variant contains QueryPlannerError) gets converted to tower::BoxError

SimonSapin avatar Jul 11 '22 09:07 SimonSapin

Based on discussion with @garypen only Rhai errors still need more handling:

  • For errors coming the "actual" query planner, the current behavior is already what we want: the service returns a Result whose errors are propagated at [1] with the ? operator so that they short-circuit the rest of the pipeline. If we don’t have a query plan, don’t do anything else

  • Rust plugins that provide a query planner service are given a nested query planner service and are responsible for calling it and propagating errors with the ? operator, since they also return a Result.

  • When a Rhai plugin throws an error an the query planner stage, QueryPlannerResponse::error_new is called. Currently this ignores the error with a tracing warning. This should be changed to short-circuit the rest of the pipeline and propagate the error.

[1] https://github.com/apollographql/router/blob/b6a26397f811f526dbd32e3d6852bf8391abf297/apollo-router/src/services/router_service.rs#L135-L143

SimonSapin avatar Jul 19 '22 12:07 SimonSapin

We’ve discussed this and agreed that query planner errors should not go through Result::Err. Instead we’ll change QueryPlannerResponse to:

  • Add a Vec<apollo_router::graphql::Error>
  • Make the query plan optional, so that it is not present when the query planner encountered a fatal error. Such an error would be in the Vec

This will make handling of Rhai errors in the query-planner stage the same as in other stages.

SimonSapin avatar Aug 11 '22 11:08 SimonSapin

With https://github.com/apollographql/router/pull/1552 there are no query-planner-level Rhai plugins anymore. https://github.com/apollographql/router/pull/1504 is still relevant in order to use GraphQL errors instead of Result::Err(BoxError), but it doesn’t affect the public API anymore. So removing the 1.0 label.

SimonSapin avatar Aug 23 '22 13:08 SimonSapin