federation-jvm-spring-example icon indicating copy to clipboard operation
federation-jvm-spring-example copied to clipboard

chore(deps): update ghcr.io/apollographql/router docker tag to v1.61.1

Open renovate[bot] opened this issue 8 months ago • 0 comments

This PR contains the following updates:

Package Update Change
ghcr.io/apollographql/router minor v1.32.0 -> v1.61.1

Release Notes

apollographql/router (ghcr.io/apollographql/router)

v1.61.1

Compare Source

🐛 Fixes
Use correct default values on omitted OTLP endpoints (PR #​6931)

Previously, when the configuration didn't specify an OTLP endpoint, the Router would always default to http://localhost:4318. However, port 4318 is the correct default only for the HTTP protocol, while port 4317 should be used for gRPC.

Additionally, all other telemetry defaults in the Router configuration consistently use 127.0.0.1 as the hostname rather than localhost.

With this change, the Router now uses:

  • http://127.0.0.1:4317 as the default for gRPC protocol
  • http://127.0.0.1:4318 as the default for HTTP protocol

This ensures protocol-appropriate port defaults and consistent hostname usage across all telemetry configurations.

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6931

Separate entity keys and representation variables in entity cache key (Issue #​6673)

This fix separates the entity keys and representation variable values in the cache key, to avoid issues with @requires for example.

By @​bnjjj in https://github.com/apollographql/router/pull/6888

🔒 Security
Add batching.maximum_size configuration option to limit maximum client batch size (PR #​7005)

Add an optional maximum_size parameter to the batching configuration.

  • When specified, the router will reject requests which contain more than maximum_size queries in the client batch.
  • When unspecified, the router performs no size checking (the current behavior).

If the number of queries provided exceeds the maximum batch size, the entire batch fails with error code 422 (Unprocessable Content). For example:

{
  "errors": [
    {
      "message": "Invalid GraphQL request",
      "extensions": {
        "details": "Batch limits exceeded: you provided a batch with 3 entries, but the configured maximum router batch size is 2",
        "code": "BATCH_LIMIT_EXCEEDED"
      }
    }
  ]
}

By @​carodewig in https://github.com/apollographql/router/pull/7005

🔍 Debuggability
Add apollo.router.pipelines metrics (PR #​6967)

When the router reloads, either via schema change or config change, a new request pipeline is created. Existing request pipelines are closed once their requests finish. However, this may not happen if there are ongoing long requests that do not finish, such as Subscriptions.

To enable debugging when request pipelines are being kept around, a new gauge metric has been added:

  • apollo.router.pipelines - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration

By @​BrynCooke in https://github.com/apollographql/router/pull/6967

Add apollo.router.open_connections metric (PR #​7023)

To help users to diagnose when connections are keeping pipelines hanging around, the following metric has been added:

  • apollo.router.open_connections - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration.
    • server.address - The address that the router is listening on.
    • server.port - The port that the router is listening on if not a unix socket.
    • state - Either active or terminating.

You can use this metric to monitor when connections are open via long running requests or keepalive messages.

By @​BrynCooke in https://github.com/apollographql/router/pull/7009

v1.61.0

Compare Source

🚀 Features
Query planner dry-run option (PR #​6656)

This PR adds a new dry-run option to the Apollo-Expose-Query-Plan header value that emits the query plans back to Studio for visualizations. This new value will only emit the query plan, and abort execution. This can be helpful for tools like rover, where query plan generation is needed but not full runtime, or for potentially prewarming query plan caches out of band.

curl --request POST --include \
     --header 'Accept: application/json' \
     --header 'Apollo-Expose-Query-Plan: dry-run' \
     --url 'http://127.0.0.1:4000/' \
     --data '{"query": "{ topProducts { upc name } }"}'

By @​aaronArinder and @​lennyburdette in https://github.com/apollographql/router/pull/6656.

Enable Remote Proxy Downloads

This enables users without direct download access to specify a remote proxy mirror location for the github download of the Apollo Router releases.

By @​LongLiveCHIEF in https://github.com/apollographql/router/pull/6667

🐛 Fixes
Header propagation rules passthrough (PR #​6690)

Header propagation contains logic to prevent headers from being propagated more than once. This was broken in https://github.com/apollographql/router/pull/6281 which always considered a header propagated regardless if a rule actually matched.

This PR alters the logic so that a header is marked as fixed only when it's populated.

The following will now work again:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b

Note that defaulting a header WILL populate it, so make sure to include your defaults last in your propagation rules.

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
          default: defaulted # This will prevent any further rule evaluation for header `b`
      - propagate:
          named: b

Instead, make sure that your headers are defaulted last:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b
          default: defaulted # OK

By @​BrynCooke in https://github.com/apollographql/router/pull/6690

Entity cache: fix directive conflicts in cache-control header (Issue #​6441)

Unnecessary cache-control directives are created in cache-control header. The router will now filter out unnecessary values from the cache-control header when the request resolves. So if there's max-age=10, no-cache, must-revalidate, no-store, the expected value for the cache-control header would simply be no-store. Please see the MDN docs for justification of this reasoning: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#preventing_storing

By @​bnjjj in https://github.com/apollographql/router/pull/6543

Query Planning: fix __typename selections in sibling typename optimization

The query planner uses an optimization technique called "sibling typename", which attaches __typename selections to their sibling selections so the planner won't need to plan them separately.

Previously, when there were multiple identical selections and one of them has a __typename attached, the query planner could pick the one without the attachment, effectively losing a __typename selection.

Now, the query planner favors the one with a __typename attached without losing the __typename selection.

By @​duckki in https://github.com/apollographql/router/pull/6824

📃 Configuration
Promote experimental_otlp_tracing_sampler config to stable (PR #​6070)

The router's otlp tracing sampler feature that was previously experimental is now generally available.

If you used its experimental configuration, you should migrate to the new configuration option:

  • telemetry.apollo.experimental_otlp_tracing_sampler is now telemetry.apollo.otlp_tracing_sampler

The experimental configuration option is now deprecated. It remains functional but will log warnings.

By @​garypen in https://github.com/apollographql/router/pull/6070

Promote experimental_local_manifess config for persisted queries to stable

The experimental_local_manifests PQ configuration option is being promoted to stable. This change updates the configuration option name and any references to it, as well as the related documentation. The experimental_ usage remains valid as an alias for existing usages.

By @​trevor-scheer in https://github.com/apollographql/router/pull/6564

🛠 Maintenance
Reduce demand control allocations on start/reload (PR #​6754)

When demand control is enabled, the router now preallocates capacity for demand control's processed schema and shrinks to fit after processing. When it's disabled, the router skips the type processing entirely to minimize startup impact.

By @​tninesling in https://github.com/apollographql/router/pull/6754

v1.60.1

Compare Source

🐛 Fixes
Header propagation rules passthrough (PR #​6690)

Header propagation contains logic to prevent headers from being propagated more than once. This was broken in https://github.com/apollographql/router/pull/6281 which always considered a header propagated regardless if a rule actually matched.

This PR alters the logic so that only when a header is populated then the header is marked as fixed.

The following will now work again:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b

Note that defaulting a head WILL populate a header, so make sure to include your defaults last in your propagation rules.

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
          default: defaulted # This will prevent any further rule evaluation for header `b`
      - propagate:
          named: b

Instead, make sure that your headers are defaulted last:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b
          default: defaulted # OK

By @​BrynCooke in https://github.com/apollographql/router/pull/6690

Entity cache: fix directive conflicts in cache-control header (Issue #​6441)

Unnecessary cache-control directives are created in cache-control header. The router will now filter out unnecessary values from the cache-control header when the request resolves. So if there's max-age=10, no-cache, must-revalidate, no-store, the expected value for the cache-control header would simply be no-store. Please see the MDN docs for justification of this reasoning: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#preventing_storing

By @​bnjjj in https://github.com/apollographql/router/pull/6543

Resolve regressions in fragment compression for certain operations (PR #​6651)

In v1.58.0 we introduced a new compression strategy for subgraph GraphQL operations to replace an older, more complicated algorithm.

While we were able to validate improvements for a majority of cases, some regressions still surfaced. To address this, we are extending it to compress more operations with the following outcomes:

  • The P99 overhead of running the new compression algorithm on the largest operations in our corpus is now just 10ms
  • In case of better compression, at P99 it shrinks the operations by 50Kb when compared to the old algorithm
  • In case of worse compression, at P99 it only adds an additional 108 bytes compared to the old algorithm, which was an acceptable trade-off versus added complexity

By @​dariuszkuc in https://github.com/apollographql/router/pull/6651

v1.60.0

Compare Source

🚀 Features
Improve BatchProcessor observability (Issue #​6558)

A new metric has been introduced to allow observation of how many spans are being dropped by an telemetry batch processor.

  • apollo.router.telemetry.batch_processor.errors - The number of errors encountered by exporter batch processors.
    • name: One of apollo-tracing, datadog-tracing, jaeger-collector, otlp-tracing, zipkin-tracing.
    • error = One of channel closed, channel full.

By observing the number of spans dropped it is possible to estimate what batch processor settings will work for you.

In addition, the log message for dropped spans will now indicate which batch processor is affected.

By @​bryncooke in https://github.com/apollographql/router/pull/6558

🐛 Fixes
Improve performance of query hashing by using a precomputed schema hash (PR #​6622)

The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.

For more details on why these design decisions were made, please see the PR description

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6622

Truncate invalid error paths (PR #​6359)

This fix addresses an issue where the router was silently dropping subgraph errors that included invalid paths.

According to the GraphQL Specification an error path must point to a response field:

If an error can be associated to a particular field in the GraphQL result, it must contain an entry with the key path that details the path of the response field which experienced the error.

The router now truncates the path to the nearest valid field path if a subgraph error includes a path that can't be matched to a response field,

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6359

Eagerly init subgraph operation for subscription primary nodes (PR #​6509)

When subgraph operations are deserialized, typically from a query plan cache, they are not automatically parsed into a full document. Instead, each node needs to initialize its operation(s) prior to execution. With this change, the primary node inside SubscriptionNode is initialized in the same way as other nodes in the plan.

By @​tninesling in https://github.com/apollographql/router/pull/6509

Fix increased memory usage in sysinfo since Router 1.59.0 (PR #​6634)

In version 1.59.0, Apollo Router started using the sysinfo crate to gather metrics about available CPUs and RAM. By default, that crate uses rayon internally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.

In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.

This regression is now fixed by:

  • Disabling sysinfo’s use of rayon, so the thread pool is not created and system processes information is gathered in a sequential loop.
  • Making sysinfo not gather that information in the first place since Router does not use it.

By @​SimonSapin in https://github.com/apollographql/router/pull/6634

Optimize demand control lookup (PR #​6450)

The performance of demand control in the router has been optimized.

Previously, demand control could reduce router throughput due to its extra processing required for scoring.

This fix improves performance by shifting more data to be computed at plugin initialization and consolidating lookup queries:

  • Cost directives for arguments are now stored in a map alongside those for field definitions
  • All precomputed directives are bundled into a struct for each field, along with that field's extended schema type. This reduces 5 individual lookups to a single lookup.
  • Response scoring was looking up each field's definition twice. This is now reduced to a single lookup.

By @​tninesling in https://github.com/apollographql/router/pull/6450

Fix missing Content-Length header in subgraph requests (Issue #​6503)

A change in 1.59.0 caused the Router to send requests to subgraphs without a Content-Length header, which would cause issues with some GraphQL servers that depend on that header.

This solves the underlying bug and reintroduces the Content-Length header.

By @​nmoutschen in https://github.com/apollographql/router/pull/6538

🛠 Maintenance
Remove the legacy query planner (PR #​6418)

The legacy query planner has been removed in this release. In the previous release, router v1.58, it was no longer used by default but was still available through the experimental_query_planner_mode configuration key. That key is now removed.

Also removed are configuration keys which were only relevant to the legacy planner:

  • supergraph.query_planning.experimental_parallelism: the new planner can always use available parallelism.
  • supergraph.experimental_reuse_query_fragments: this experimental algorithm that attempted to reuse fragments from the original operation while forming subgraph requests is no longer present. Instead, by default new fragment definitions are generated based on the shape of the subgraph operation.

By @​SimonSapin in https://github.com/apollographql/router/pull/6418

Migrate various metrics to OTel instruments (PR #​6476, PR #​6356, PR #​6539)

Various metrics using our legacy mechanism based on the tracing crate are migrated to OTel instruments.

By @​goto-bus-stop in https://github.com/apollographql/router/pull/6476, https://github.com/apollographql/router/pull/6356, https://github.com/apollographql/router/pull/6539

📚 Documentation
Add instrumentation configuration examples (PR #​6487)

The docs for router telemetry have new example configurations for common use cases for selectors and condition.

By @​shorgi in https://github.com/apollographql/router/pull/6487

🧪 Experimental
Remove experimental_retry option (PR #​6338)

The experimental_retry option has been removed due to its limited use and functionality during its experimental phase.

By @​bnjjj in https://github.com/apollographql/router/pull/6338

v1.59.2

Compare Source

[!IMPORTANT]

This release contains important fixes which address resource utilization regressions which impacted Router v1.59.0 and v1.59.1. These regressions were in the form of:

  1. A small baseline increase in memory usage; AND
  2. Additional per-request CPU and memory usage for queries which included references to abstract types with a large number of implementations

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🐛 Fixes
Improve performance of query hashing by using a precomputed schema hash (PR #​6622)

The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.

For more details on why these design decisions were made, please see the PR description

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6622

Fix increased memory usage in sysinfo since Router 1.59.0 (PR #​6634)

In version 1.59.0, Apollo Router started using the sysinfo crate to gather metrics about available CPUs and RAM. By default, that crate uses rayon internally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.

In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.

This regression is now fixed by:

  • Disabling sysinfo’s use of rayon, so the thread pool is not created and system processes information is gathered in a sequential loop.
  • Making sysinfo not gather that information in the first place since Router does not use it.

By @​SimonSapin in https://github.com/apollographql/router/pull/6634

v1.59.1

Compare Source

[!IMPORTANT]

This release was impacted by a resource utilization regression which was fixed in v1.59.2. See the release notes for that release for more details. As a result, we recommend using v1.59.2 rather than v1.59.1 or v1.59.0.

🐛 Fixes
Fix transmitted header value for Datadog priority sampling resolution (PR #​6017)

The router now transmits correct values of x-datadog-sampling-priority to downstream services.

Previously, an x-datadog-sampling-priority of -1 was incorrectly converted to 0 for downstream requests, and 2 was incorrectly converted to 1. When propagating to downstream services, this resulted in values of USER_REJECT being incorrectly transmitted as AUTO_REJECT.

Enable accurate Datadog APM metrics (PR #​6017)

The router supports a new preview feature, the preview_datadog_agent_sampling option, to enable sending all spans to the Datadog Agent so APM metrics and views are accurate.

Previously, the sampler option in telemetry.exporters.tracing.common.sampler wasn't Datadog-aware. To get accurate Datadog APM metrics, all spans must be sent to the Datadog Agent with a psr or sampling.priority attribute set appropriately to record the sampling decision.

The preview_datadog_agent_sampling option enables accurate Datadog APM metrics. It should be used when exporting to the Datadog Agent, via OTLP or Datadog-native.

telemetry:
  exporters:
    tracing:
      common:

##### Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
        sampler: 0.1

##### Send all spans to the Datadog agent.
        preview_datadog_agent_sampling: true

Using these options can decrease your Datadog bill, because you will be sending only a percentage of spans from the Datadog Agent to Datadog.

[!IMPORTANT]

  • Users must enable preview_datadog_agent_sampling to get accurate APM metrics. Users that have been using recent versions of the router will have to modify their configuration to retain full APM metrics.
  • The router doesn't support in-agent ingestion control.
  • Configuring traces_per_second in the Datadog Agent won't dynamically adjust the router's sampling rate to meet the target rate.
  • Sending all spans to the Datadog Agent may require that you tweak the batch_processor settings in your exporter config. This applies to both OTLP and Datadog native exporters.

Learn more by reading the updated Datadog tracing documentation for more information on configuration options and their implications.

Fix non-parent sampling (PR #​6481)

When the user specifies a non-parent sampler the router should ignore the information from upstream and use its own sampling rate.

The following configuration would not work correctly:

  exporters:
    tracing:
      common:
        service_name: router
        sampler: 0.00001
        parent_based_sampler: false

All spans are being sampled. This is now fixed and the router will correctly ignore any upstream sampling decision.

By @​BrynCooke in https://github.com/apollographql/router/pull/6481

v1.59.0

Compare Source

[!IMPORTANT] Router version 1.53.0 through to 1.59.0 have an issue where users of the Datadog exporter will see all traces sampled at 100%. This is due to the Router incorrectly setting the priority sampled flag on spans 100% of the time. This will cause all traces that are sent to Datadog agent to be forwarded on to Datadog, potentially incurring costs.

Update to 1.59.1 to resolve this issue. Datadog users may wish to enable preview_datadog_agent_sampling to enable accurate APM metrics.

[!IMPORTANT]

This release was impacted by a resource utilization regression which was fixed in v1.59.2. See the release notes for that release for more details. As a result, we recommend using v1.59.2 rather than v1.59.1 or v1.59.0.

[!IMPORTANT] If you have enabled distributed query plan caching, updates to the query planner in this release will result in query plan caches being regenerated rather than reused. On account of this, you should anticipate additional cache regeneration cost when updating to this router version while the new query plans come into service.

🚀 Features
General availability of native query planner

The router's native, Rust-based, query planner is now generally available and enabled by default.

The native query planner achieves better performance for a variety of graphs. In our tests, we observe:

  • 10x median improvement in query planning time (observed via apollo.router.query_planning.plan.duration)
  • 2.9x improvement in router’s CPU utilization
  • 2.2x improvement in router’s memory usage

Note: you can expect generated plans and subgraph operations in the native query planner to have slight differences when compared to the legacy, JavaScript-based query planner. We've ascertained these differences to be semantically insignificant, based on comparing ~2.5 million known unique user operations in GraphOS as well as comparing ~630 million operations across actual router deployments in shadow mode for a four month duration.

The native query planner supports Federation v2 supergraphs. If you are using Federation v1 today, see our migration guide on how to update your composition build step. Subgraph changes are typically not needed.

The legacy, JavaScript, query planner is deprecated in this release, but you can still switch back to it if you are still using Federation v1 supergraph:

experimental_query_planner_mode: legacy

Note: The subgraph operations generated by the query planner are not guaranteed consistent release over release. We strongly recommend against relying on the shape of planned subgraph operations, as new router features and optimizations will continuously affect it.

By @​sachindshinde, @​goto-bus-stop, @​duckki, @​TylerBloom, @​SimonSapin, @​dariuszkuc, @​lrlna, @​clenfest, and @​o0Ignition0o.

Ability to skip persisted query list safelisting enforcement via plugin (PR #​6403)

If safelisting is enabled, a router_service plugin can skip enforcement of the safelist (including the require_id check) by adding the key apollo_persisted_queries::safelist::skip_enforcement with value true to the request context.

Note: this doesn't affect the logging of unknown operations by the persisted_queries.log_unknown option.

In cases where an operation would have been denied but is allowed due to the context key existing, the attribute persisted_queries.safelist.enforcement_skipped is set on the apollo.router.operations.persisted_queries metric with value true.

By @​glasser in https://github.com/apollographql/router/pull/6403

Add fleet awareness plugin (PR #​6151)

A new fleet_awareness plugin has been added that reports telemetry to Apollo about the configuration and deployment of the router.

The reported telemetry include CPU and memory usage, CPU frequency, and other deployment characteristics such as operating system and cloud provider. For more details, along with a full list of data captured and how to opt out, go to our data privacy policy.

By @​jonathanrainer, @​nmoutschen, @​loshz in https://github.com/apollographql/router/pull/6151

Add fleet awareness schema metric (PR #​6283)

The router now supports the apollo.router.instance.schema metric for its fleet_detector plugin. It has two attributes: schema_hash and launch_id.

By @​loshz and @​nmoutschen in https://github.com/apollographql/router/pull/6283

Support client name for persisted query lists (PR #​6198)

The persisted query manifest fetched from Apollo Uplink can now contain a clientName field in each operation. Two operations with the same id but different clientName are considered to be distinct operations, and they may have distinct bodies.

The router resolves the client name by taking the first from the following that exists:

  • Reading the apollo_persisted_queries::client_name context key that may be set by a router_service plugin
  • Reading the HTTP header named by telemetry.apollo.client_name_header, which defaults to apollographql-client-name

If a client name can be resolved for a request, the router first tries to find a persisted query with the specified ID and the resolved client name.

If there is no operation with that ID and client name, or if a client name cannot be resolved, the router tries to find a persisted query with the specified ID and no client name specified. This means that existing PQ lists that don't contain client names will continue to work.

To learn more, go to persisted queries docs.

By @​glasser in https://github.com/apollographql/router/pull/6198

🐛 Fixes
Fix coprocessor empty body object panic (PR #​6398)

Previously, the router would panic if a coprocessor responds with an empty body object at the supergraph stage:

{
  ... // other fields
  "body": {} // empty object
}

This has been fixed in this release.

Note: the previous issue didn't affect coprocessors that responded with formed responses.

By @​BrynCooke in https://github.com/apollographql/router/pull/6398

Ensure cost directives are picked up when not explicitly imported (PR #​6328)

With the recent composition changes, importing @cost results in a supergraph schema with the cost specification import at the top. The @cost directive itself is not explicitly imported, as it's expected to be available as the default export from the cost link. In contrast, uses of @listSize to translate to an explicit import in the supergraph.

Old SDL link

@​link(
    url: "https://specs.apollo.dev/cost/v0.1"
    import: ["@​cost", "@​listSize"]
)

New SDL link

@​link(url: "https://specs.apollo.dev/cost/v0.1", import: ["@​listSize"])

Instead of using the directive names from the import list in the link, the directive names now come from SpecDefinition::directive_name_in_schema, which is equivalent to the change we made on the composition side.

By @​tninesling in https://github.com/apollographql/router/pull/6328

Fix query hashing algorithm (PR #​6205)

The router includes a schema-aware query hashing algorithm designed to return the same hash across schema updates if the query remains unaffected. This update enhances the algorithm by addressing various corner cases to improve its reliability and consistency.

By @​Geal in https://github.com/apollographql/router/pull/6205

Fix typo in persisted query metric attribute (PR #​6332)

The apollo.router.operations.persisted_queries metric reports an attribute when a persisted query was not found. Previously, the attribute name was persisted_quieries.not_found, with one i too many. Now it's persisted_queries.not_found.

By @​goto-bus-stop in https://github.com/apollographql/router/pull/6332

Fix telemetry instrumentation using supergraph query selector (PR #​6324)

Previously, router telemetry instrumentation that used query selectors could log errors with messages such as this is a bug and should not happen.

These errors have now been fixed, and configurations with query selectors such as the following work properly:

telemetry:
  exporters:
    metrics:
      common:
        views:

##### Define a custom view because operation limits are different than the default latency-oriented view of OpenTelemetry
          - name: oplimits.*
            aggregation:
              histogram:
                buckets:
                  - 0
                  - 5
                  - 10
                  - 25
                  - 50
                  - 100
                  - 500
                  - 1000
  instrumentation:
    instruments:
      supergraph:
        oplimits.aliases:
          value:
            query: aliases
          type: histogram
          unit: number
          description: "Aliases for an operation"
        oplimits.depth:
          value:
            query: depth
          type: histogram
          unit: number
          description: "Depth for an operation"
        oplimits.height:
          value:
            query: height
          type: histogram
          unit: number
          description: "Height for an operation"
        oplimits.root_fields:
          value:
            query: root_fields
          type: histogram
          unit: number
          description: "Root fields for an operation"

By @​bnjjj in https://github.com/apollographql/router/pull/6324

More consistent attributes on apollo.router.operations.persisted_queries metric (PR #​6403)

Version 1.28.1 added several unstable metrics, including apollo.router.operations.persisted_queries.

When an operation is rejected, Router includes a persisted_queries.safelist.rejected.unknown attribute on the metric. Previously, this attribute had the value true if the operation is logged (via log_unknown), and false if the operation is not logged. (The attribute is not included at all if the operation is not rejected.) This appears to have been a mistake, as you can also tell whether it is logged via the persisted_queries.logged attribute.

Router now only sets this attribute to true, and never to false. Note these metrics are unstable and will continue to change.

By @​glasser in https://github.com/apollographql/router/pull/6403

Drop experimental reuse fragment query optimization option (PR #​6354)

Drop support for the experimental reuse fragment query optimization. This implementation was not only very slow but also very buggy due to its complexity.

Auto generation of fragments is a much simpler (and faster) algorithm that in most cases produces better results. Fragment auto generation is the default optimization since v1.58 release.

By @​dariuszkuc in https://github.com/apollographql/router/pull/6353

📃 Configuration
Add version number to distributed query plan cache keys (PR #​6406)

The router now includes its version number in the cache keys of distributed cache entries. Given that a new router release may change how query plans are generated or represented, including the router version in a cache key enables the router to use separate cache entries for different versions.

If you have enabled distributed query plan caching, expect additional processing for your cache to update for this router release.

By @​SimonSapin in https://github.com/apollographql/router/pull/6406

🛠 Maintenance
Remove catch_unwind wrapper around the native query planner (PR #​6397)

As part of internal maintenance of the query planner, the catch_unwind wrapper around the native query planner has been removed. This wrapper served as an extra safeguard for potential panics the native planner could produce. The native query planner however no longer has any code paths that could panic. We have also not witnessed a panic in the last four months, having processed 560 million real user operations through the native planner.

This maintenance work also removes backtrace capture for federation errors, which was used for debugging and is no longer necessary as we have the confidence in the native planner's implementation.

By @​lrlna in https://github.com/apollographql/router/pull/6397

Deprecate various metrics (PR #​6350)

Several metrics have been deprecated in this release, in favor of OpenTelemetry-compatible alternatives:

  • apollo_router_deduplicated_subscriptions_total - use the apollo.router.operations.subscriptions metric's subscriptions.deduplicated attribute.
  • apollo_authentication_failure_count - use the apollo.router.operations.authentication.jwt metric's authentication.jwt.failed attribute.
  • apollo_authentication_success_count - use the apollo.router.operations.authentication.jwt metric instead. If the authentication.jwt.failed attribute is absent or false, the authentication succeeded.
  • apollo_require_authentication_failure_count - use the http.server.request.duration metric's http.response.status_code attribute. Requests with authentication failures have HTTP status code 401.
  • apollo_router_timeout - this metric conflates timed-out requests from client to the router, and requests from the router to subgraphs. Timed-out requests have HTTP status code 504. Use the http.response.status_code attribute on the http.server.request.duration metric to identify timed-out router requests, and the same attribute on the http.client.request.duration metric to identify timed-out subgraph requests.

The deprecated metrics will continue to work in the 1.x release line.

By @​goto-bus-stop in https://github.com/apollographql/router/pull/6350

v1.58.1

Compare Source

[!IMPORTANT] Router version 1.53.0 through to 1.59.0 have an issue where users of the Datadog exporter will see all traces sampled at 100%. This is due to the Router incorrectly setting the priority sampled flag on spans 100% of the time. This will cause all traces that are sent to Datadog agent to be forwarded on to Datadog, potentially incurring costs.

Update to 1.59.1 to resolve this issue. Datadog users may wish to enable preview_datadog_agent_sampling to enable accurate APM metrics.

[!IMPORTANT] If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🐛 Fixes
Particular supergraph telemetry customizations using the query selector do not error (PR #​6324)

Telemetry customizations like those featured in the request limits telemetry documentation now work as intended when using the query selector on the supergraph layer. Prior to this fix, this was sometimes causing a this is a bug and should not happen error, but is now resolved.

By @​bnjjj in https://github.com/apollographql/router/pull/6324

Native query planner now receives both "plan" and "path" limits configuration (PR #​6316)

The native query planner now correctly sets two experimental configuration options for limiting query planning complexity. These were previously available in the configuration and observed by the legacy planner, but were not being passed to the new native planner until now:

  • supergraph.query_planning.experimental_plans_limit
  • supergraph.query_planning.experimental_paths_limit

By @​goto-bus-stop in https://github.com/apollographql/router/pull/6316

v1.58.0

Compare Source

[!IMPORTANT] Router version 1.53.0 through to 1.59.0 have an issue where users of the Datadog exporter will see all traces sampled at 100%. This is due to the Router incorrectly setting the priority sampled flag on spans 100% of the time. This will cause all traces that are sent to Datadog agent to be forwarded on to Datadog, potentially incurring costs.

Update to 1.59.1 to resolve this issue. Datadog users may wish to enable preview_datadog_agent_sampling to enable accurate APM metrics.

[!IMPORTANT] If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🚀 Features
Support DNS resolution strategy configuration (PR #​6109)

The router now supports a configurable DNS resolution strategy for the URLs of coprocessors and subgraphs. The new option is called dns_resolution_strategy and supports the following values:

  • ipv4_only - Only query for A (IPv4) records.
  • ipv6_only - Only query for AAAA (IPv6) records.
  • ipv4_and_ipv6 - Query for both A (IPv4) and AAAA (IPv6) records in parallel.
  • ipv6_then_ipv4 - Query for AAAA (IPv6) records first; if that fails, query for A (IPv4) records.
  • ipv4_then_ipv6(default) - Query for A (IPv4) records first; if that fails, query for AAAA (IPv6) records.

You can change the DNS resolution strategy applied to a subgraph's URL:

traffic_shaping:
  all:
    dns_resolution_strategy: ipv4_then_ipv6

You can also change the DNS resolution strategy applied to a coprocessor's URL:

coprocessor:
  url: http://coprocessor.example.com:8081
  client:
    dns_resolution_strategy: ipv4_then_ipv6

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6109

Configuration options for HTTP/1 max headers and buffer limits (PR #​6194)

This update introduces configuration options that allow you to adjust the maximum number of HTTP/1 request headers and the maximum buffer size allocated for headers.

By default, the router accepts HTTP/1 requests with up to 100 headers and allocates ~400 KiB of buffer space to store them. If you need to handle requests with more headers or require a different buffer size, you can now configure these limits in the router's configuration file:

limits:
  http1_request_max_headers: 200
  http1_request_max_buf_size: 200kib

If you are using the router as a Rust crate, the http1_request_max_buf_size option requires the hyper_header_limits feature and also necessitates using Apollo's fork of the Hyper crate until the changes are merged upstream. You can include this fork by adding the following patch to your Cargo.toml file:

[patch.crates-io]
"hyper" = { git = "https://github.com/apollographql/hyper.git", tag = "header-customizations-20241108" }

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6194

Compress subgraph operations by generating fragments (PR #​6013)

The router now compresses operations sent to subgraphs by default by generating fragment definitions and using them in the operation.

This change enables generate_query_fragments by default while disabling experimental_reuse_query_fragments. When enabled, experimental_reuse_query_fragments attempts to intelligently reuse the fragment definitions from the original operation. However, fragment generation with generate_query_fragments is much faster and produces better outputs in most cases.

If you are relying on the shape of fragments in your subgraph operations or tests, you can opt out of the new algorithm with the configuration below.

Note: The subgraph operations generated by the query planner are not guaranteed consistent release over release. We strongly recommend against relying on the shape of planned subgraph operations, as new router features and optimizations will continuously affect it. We plan to remove experimental_reuse_query_fragments in a future release.

supergraph:
  generate_query_fragments: false
  experimental_reuse_query_fragments: true

By @​lrlna in https://github.com/apollographql/router/pull/6013

Add subgraph request id (PR #​5858)

The router now supports a subgraph request ID that is a unique string identifying a subgraph request and response. It allows plugins and coprocessors to keep some state per subgraph request by matching on this ID. It's available in coprocessors as subgraphRequestId and Rhai scripts as request.subgraph.id and response.subgraph.id.

By @​Geal in https://github.com/apollographql/router/pull/5858

Add extensions.service for all subgraph errors (PR #​6191)

For improved debuggability, the router now supports adding a subgraph's name as an extension to all errors originating from the subgraph.

If include_subgraph_errors is true for a particular subgraph, all errors originating in this subgraph will have the subgraph's name exposed as a service extension.

You can enable subgraph errors with the following configuration:

include_subgraph_errors:
  all: true # Propagate errors from all subgraphs

Note: This option is enabled by default by the router's dev mode.

Consequently, when a subgraph returns an error, it will have a service extension with the subgraph name as its value. The following example shows the extension for a products subgraph:

{
  "data": null,
  "errors": [
    {
      "message": "Invalid product ID",
      "path": [],
      "extensions": {
        "service": "products"
      }
    }
  ]
}

By @​IvanGoncharov in [https://


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • [ ] If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate[bot] avatar Mar 16 '25 13:03 renovate[bot]