router
router copied to clipboard
Optional instrumentation for recording GraphQL response field lengths in OTel
Overview
Adds a new instrumentation config, graphql
, which supports a single metric called field.length
. When enabled, this will publish the lengths of array fields returned in primary supergraph responses. This is primarily meant to help debug unexpected cost values calculated by the demand control plugin, as these discrepancies are multiplied by the length of lists in the responses.
Primary responses only
Note that this implementation does not work for deferred responses. The primary blocker for this is that we don't currently have a way to zip a response with a query when that response doesn't start at the query root. To make this work, we would need to take the deferred response's json path and determine which subsection of the schema we should use for the zip procedure.
No support for custom attributes
The other instrumentation configurations support custom metrics using predefined attributes, for example, you can create a custom router metric based on the http response status code. This functionality comes from the custom histogram/attribute/selector framework we've implemented, but this GraphQL field-related code does not seem to fit cleanly into those existing abstractions. In the interest of time, I've settled on creating this one-off metric which is not extensible and cannot be used in custom metrics.
No support for conditions
One change not included in this PR that we will need to add is support for filtering via conditions. This metric will be published for every list field across all responses when enabled, which has the potential to produce far more information than is useful or wanted. The existing conditions implementation is likely not compatible with this implementation as-is because we need to check a given condition for each field in the response when determining if we should publish the metric or not. The current conditions setup will cache any evaluated condition, such that if the condition is true once, it will be rewritten to a static true condition that will not be re-evaluated. We will need to create some uncached equivalent which can be evaluated several times within a single request pipeline to be used with this field length metric. That will be coming in the next PR.
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
- [X] Changes are compatible[^1]
- [ ] Documentation[^2] completed
- [ ] Performance impact assessed and acceptable
- Tests added and passing[^3]
- [X] Unit Tests
- [ ] Integration Tests
- [ ] Manual Tests
Exceptions
Note any exceptions here
Notes
[^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.