router
router copied to clipboard
Redact sensitive data from responses but include in router logs
Describe the solution you'd like I would like sensitive data of errors to be redacted (excluded) from apollo router query responses. Additionally, I wish that the error data would be found in router's logs (otherwise we wouldn't see it anywhere).
Describe alternatives you've considered The default behaviour of router could be changed to achieve this result. It may also be possible to achieve the result with custom plugins.
Additional context
I have been experimenting with the error redacting in Apollo router and checking how it behaves in comparison with Apollo gateway.
For my tests I was using router v0.9.2 with APOLLO_ROUTER_LOG=info . I did not set any value for experimental.include_subgraph_errors, so in practice I was using this:
By default, the router will redact subgraph errors from clients.
My expectation was that I’d get any sensitive info excluded from router responses but that there will be enough info in router logs.
In my tests below, “dev gateway” means apollo gateway with NODE_ENV=development.
And “prod gateway” means apollo gateway with NODE_ENV=production.
Here are my test results:
Symbols: :white_check_mark: - good, behaves as I expected :x: - bad, I expected it to behave differently
1. No connection to subgraph In this test case apollo router and gateway have no connection to the subgraph at all. :white_check_mark: errors in logs for dev and prod gateway are the same, and they are thorough enough :white_check_mark: gateway dev shows stack traces in responses, gateway prod does not :white_check_mark: router does not include stack trace in response :white_check_mark: router prints thorough info to the logs :x: router reveals thorough info about the error in its response
2. Wrong field requested In my test case I requested a field that is not defined in the schema. :white_check_mark: gateway dev prints stack trace in response, gateway prod does not. :white_check_mark: router does not print stack trace in response :white_check_mark: gateways and router print an understandable message in the response :white_check_mark: only gateways include extensions.code GRAPHQL_VALIDATION_FAILED :x: router does not include any extensions.code :x: router does not print anything understandable to its own log, just “Status code: 500 Internal Server Error”
3. Bad user input In my test case I supplied a number for a (nested) input parameter that’s supposed to be a string :white_check_mark: gateway dev prints stack trace in responses, gateway prod and router do not :white_check_mark: gateways instruct more thoroughly with the error message in their response where the problem is :white_check_mark: / :x: router instructs less clearly about the error in the response’s error message but still likely on an understandable level. Specific nested fields within the input field are not mentioned. :x: gateways wrongly print to their logs that the operation was successful (may also be our custom code) :x: router does not print anything to its logs
4. Unreachable datasource or an Error thrown in subgraph
In this test case I was running a subgraph (subgraphA) locally and did not have one of the data sources reachable from it. The same result is achieved if I throw new Error('Custom error') in the subgraph’s query resolver.
:white_check_mark: gateway dev prints stack traces in the response, gateway prod does not
:white_check_mark: gateways print thorough error with stack trace to their logs
:x: router does not properly redact the error thrown by the locally running subgraphA. Too much info is revealed in it, including stack traces from the subgraphA.
:x: router does not print anything to its logs
I have setup and replicated these tests (router only) and added some additional data.
- I confirmed that this scenario results in error logging and a detailed error response which reveals that the router could not connect to the subgraph. Exposed details include:
- subgraph name
- HTTP fetch error occurred (TCP connect error with OS code 61)
- In my recreation, I saw this error:
{"errors":[{"message":"value retrieval failed: spec error: invalid type error, expected another type than 'Named type User_'","locations":[],"path":null}]}%
In the logs I saw:
2022-06-27T09:29:57.261011Z ERROR request{method=POST uri=/ version=HTTP/1.1 otel.kind=server otel.status_code= otel.status_code="ERROR"}: tower_http::trace::on_failure: response failed classification=Status code: 500 Internal Server Error latency=2 ms
- (In my query, I supplied a string when it should have been a number)
{"errors":[{"message":"invalid type for variable: 'first'","locations":[],"path":null,"extensions":{"type":"ValidationInvalidTypeVariable","name":"first"}}]}%
Nothing was written to the logs.
- I noted that a full stack-trace was reported to the client and nothing written to logs. This is a bug caused by the disabling of the subgraph error redaction functionality. If I re-enable that functionality, the subgraph error is correctly redacted. There is no log entry. A small modification to the plugin added a log WARN that errors were being redacted.
Summary (for now): 4 can be addressed relatively easily with a fix to re-enable an existing plugin. 1 requires more invasive changes since the redaction plugin is by-passed in these cases of failure. 2 and 3 require more investigation.
Revisiting this to test with https://github.com/apollographql/router/releases/tag/v1.0.0-alpha.3.
Client response:
{
"errors": [
{
"message": "HTTP fetch failed from 'accounts': HTTP fetch failed from 'redacted': redacted",
"path": [],
"extensions": {
"type": "SubrequestHttpError",
"service": "accounts",
"reason": "HTTP fetch failed from 'redacted': redacted"
}
}
]
}
Logs:
2022-09-08T08:40:16.760035Z ERROR request{method=POST uri=/ version=HTTP/1.1 otel.kind=server otel.status_code=}:supergraph{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" client_name="" client_version="" otel.kind=internal}:execution{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" otel.kind=internal}:fetch{otel.kind=internal}:subgraph{name="accounts" graphql.document="query ExampleQuery__accounts__0{me{id}}" graphql.operation.name="ExampleQuery__accounts__0" otel.kind=internal}: apollo_router::services::subgraph_service: fetch_error="hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 61, kind: ConnectionRefused, message: \"Connection refused\" }))"
2022-09-08T08:40:16.760453Z INFO request{method=POST uri=/ version=HTTP/1.1 otel.kind=server otel.status_code=}:supergraph{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" client_name="" client_version="" otel.kind=internal}:execution{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" otel.kind=internal}:fetch{otel.kind=internal}: apollo_router::plugins::include_subgraph_errors: redacted subgraph(accounts) error
2022-09-08T08:40:16.760581Z ERROR request{method=POST uri=/ version=HTTP/1.1 otel.kind=server otel.status_code=}:supergraph{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" client_name="" client_version="" otel.kind=internal}:execution{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" otel.kind=internal}: apollo_router::query_planner: Fetch error: HTTP fetch failed from 'accounts': HTTP fetch failed from 'redacted': redacted
The information is redacted from the client response but clearly present in the logs to facilitate problem solving.
Client response:
{
"errors": [
{
"message": "value retrieval failed: spec error: invalid type error, expected another type than 'Named type User'",
"path": null
}
]
}
Logs: Nothing in the logs now.
ACTION: Should the extensions code be included in the response? Also, it would be helpful to log the error.
Client response:
{
"errors": [
{
"message": "invalid type for variable: 'first'",
"path": null,
"extensions": {
"type": "ValidationInvalidTypeVariable",
"name": "first"
}
}
]
}
Logs: Nothing in the logs.
ACTION: Should we log this as an error?
Client response:
{
"data": {
"me": null
},
"errors": [
{
"message": "Subgraph errors redacted",
"path": null
}
]
}
Logs:
2022-09-08T09:56:33.982669Z INFO request{method=POST uri=/ version=HTTP/1.1 otel.kind=server otel.status_code=}:supergraph{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" client_name="" client_version="" otel.kind=internal}:execution{graphql.document="query ExampleQuery {\n me {\n id\n }\n}\n" graphql.operation.name="ExampleQuery" otel.kind=internal}:fetch{otel.kind=internal}: apollo_router::plugins::include_subgraph_errors: redacted subgraph(accounts) errors
My summary would be that 1 and 4 are now working fine. For 2 and 3 we need to decide if logging these errors is helpful. For 2 we need to decide if the extensions code should be added.
Discussions within the development team led us to conclude that not all users would require this type of logging in all environments. We think the requirements need some design and have filed #1732 to track this.
The extensions.code change is due to land in these fixes:
https://github.com/apollographql/federation-rs/pull/169
https://github.com/apollographql/router/pull/1720
Once this is done, I'll verify the behaviour and close this ticket as complete apart from the logging requests which will now be tracked in #1732.
Verifying 2 on main:
I couldn't get the router to accept a non-existent field for testing purposes, because that doesn't even make it into the query planner. To test this more comprehensively I came up with a query with an invalid fragment:
query Me {
me {
id
...fragment
}
}
(The query would be valid, but for the ...fragment)
This results in:
{
"errors": [
{
"message": "Unknown fragment \"fragment\".",
"extensions": {
"code": "GRAPHQL_VALIDATION_FAILED"
}
}
]
}
It looks like the expected behaviour is now happening.
This fix will be in the next release of the router. I'll close when the release happens.
Closing as fix now released.