federation-jvm-spring-example icon indicating copy to clipboard operation
federation-jvm-spring-example copied to clipboard

chore(deps): update ghcr.io/apollographql/router docker tag to v2

Open renovate[bot] opened this issue 8 months ago • 0 comments

This PR contains the following updates:

Package Update Change
ghcr.io/apollographql/router major v1.32.0 -> v2.1.0

Release Notes

apollographql/router (ghcr.io/apollographql/router)

v2.1.0

Compare Source

🚀 Features

Connectors: support for traffic shaping (PR #​6737)

Traffic shaping is now supported for connectors. To target a specific source, use the subgraph_name.source_name under the new connector.sources property of traffic_shaping. Settings under connector.all will apply to all connectors. deduplicate_query is not supported at this time.

Example config:

traffic_shaping:
  connector:
    all:
      timeout: 5s
    sources:
      connector-graph.random_person_api:
        global_rate_limit:
          capacity: 20
          interval: 1s
        experimental_http2: http2only
        timeout: 1s

By @​andrewmcgivery in https://github.com/apollographql/router/pull/6737

Connectors: Support TLS configuration (PR #​6995)

Connectors now supports TLS configuration for using custom certificate authorities and utilizing client certificate authentication.

tls:
  connector:
    sources:
      connector-graph.random_person_api:
        certificate_authorities: ${file.ca.crt}
        client_authentication:
          certificate_chain: ${file.client.crt}
          key: ${file.client.key}

By @​andrewmcgivery in https://github.com/apollographql/router/pull/6995

Update JWT handling (PR #​6930)

This PR updates JWT-handling in the AuthenticationPlugin;

  • Users may now set a new config option config.authentication.router.jwt.on_error.
    • When set to the default Error, JWT-related errors will be returned to users (the current behavior).
    • When set to Continue, JWT errors will instead be ignored, and JWT claims will not be set in the request context.
  • When JWTs are processed, whether processing succeeds or fails, the request context will contain a new variable apollo::authentication::jwt_status which notes the result of processing.

By @​Velfi in https://github.com/apollographql/router/pull/6930

Add batching.maximum_size configuration option to limit maximum client batch size (PR #​7005)

Add an optional maximum_size parameter to the batching configuration.

  • When specified, the router will reject requests which contain more than maximum_size queries in the client batch.
  • When unspecified, the router performs no size checking (the current behavior).

If the number of queries provided exceeds the maximum batch size, the entire batch fails with error code 422 (Unprocessable Content). For example:

{
  "errors": [
    {
      "message": "Invalid GraphQL request",
      "extensions": {
        "details": "Batch limits exceeded: you provided a batch with 3 entries, but the configured maximum router batch size is 2",
        "code": "BATCH_LIMIT_EXCEEDED"
      }
    }
  ]
}

By @​carodewig in https://github.com/apollographql/router/pull/7005

Introduce PQ manifest hot_reload option for local manifests (PR #​6987)

This change introduces a persisted_queries.hot_reload configuration option to allow the router to hot reload local PQ manifest changes.

If you configure local_manifests, you can set hot_reload to true to automatically reload manifest files whenever they change. This lets you update local manifest files without restarting the router.

persisted_queries:
  enabled: true
  local_manifests:
    - ./path/to/persisted-query-manifest.json
  hot_reload: true

Note: This change explicitly does not piggyback on the existing --hot-reload flag.

By @​trevor-scheer in https://github.com/apollographql/router/pull/6987

Add support to get/set URI scheme in Rhai (Issue #​6897)

This adds support to read and write the scheme from the request.uri.scheme/request.subgraph.uri.scheme functions in Rhai, enabling the ability to switch between http and https for subgraph fetches. For example:

fn subgraph_service(service, subgraph){
    service.map_request(|request|{
        log_info(`${request.subgraph.uri.scheme}`);
        if request.subgraph.uri.scheme == {} {
            log_info("Scheme is not explicitly set");
        }
        request.subgraph.uri.scheme = "https"
        request.subgraph.uri.host = "api.apollographql.com";
        request.subgraph.uri.path = "/api/graphql";
        request.subgraph.uri.port = 1234;
        log_info(`${request.subgraph.uri}`);
    });
}

By @​starJammer in https://github.com/apollographql/router/pull/6906

Add router config validate subcommand (PR #​7016)

Adds new router config validate subcommand to allow validation of a router config file without fully starting up the Router.

./router config validate <path-to-config-file.yaml>

By @​andrewmcgivery in https://github.com/apollographql/router/pull/7016

Enable remote proxy downloads of the Router

This enables users without direct download access to specify a remote proxy mirror location for the GitHub download of the Apollo Router releases.

By @​LongLiveCHIEF in https://github.com/apollographql/router/pull/6667

Add metric to measure cardinality overflow frequency (PR #​6998)

Adds a new counter metric, apollo.router.telemetry.metrics.cardinality_overflow, that is incremented when the cardinality overflow log from opentelemetry-rust occurs. This log means that a metric in a batch has reached a cardinality of > 2000 and that any excess attributes will be ignored.

By @​rregitsky in https://github.com/apollographql/router/pull/6998

Add metrics for value completion errors (PR #​6905)

When the router encounters a value completion error, it is not included in the GraphQL errors array, making it harder to observe. To surface this issue in a more obvious way, router now counts value completion error metrics via the metric instruments apollo.router.graphql.error and apollo.router.operations.error, distinguishable via the code attribute with value RESPONSE_VALIDATION_FAILED.

By @​timbotnik in https://github.com/apollographql/router/pull/6905

Add apollo.router.pipelines metrics (PR #​6967)

When the router reloads, either via schema change or config change, a new request pipeline is created. Existing request pipelines are closed once their requests finish. However, this may not happen if there are ongoing long requests that do not finish, such as Subscriptions.

To enable debugging when request pipelines are being kept around, a new gauge metric has been added:

  • apollo.router.pipelines - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration

By @​BrynCooke in https://github.com/apollographql/router/pull/6967

Add apollo.router.open_connections metric (PR #​7023)

To help users to diagnose when connections are keeping pipelines hanging around, the following metric has been added:

  • apollo.router.open_connections - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration.
    • server.address - The address that the router is listening on.
    • server.port - The port that the router is listening on if not a unix socket.
    • http.connection.state - Either active or terminating.

You can use this metric to monitor when connections are open via long running requests or keepalive messages.

By @​bryncooke in https://github.com/apollographql/router/pull/7023

Add span events to error spans for connectors and demand control plugin (PR #​6727)

New span events have been added to trace spans which include errors. These span events include the GraphQL error code that relates to the error. So far, this only includes errors generated by connectors and the demand control plugin.

By @​bonnici in https://github.com/apollographql/router/pull/6727

Changes to experimental error metrics (PR #​6966)

In 2.0.0, an experimental metric telemetry.apollo.errors.experimental_otlp_error_metrics was introduced to track errors with additional attributes. A few related changes are included here:

  • Sending these metrics now also respects the subgraph's send flag e.g. telemetry.apollo.errors.subgraph.[all|(subgraph name)].send.
  • A new configuration option telemetry.apollo.errors.subgraph.[all|(subgraph name)].redaction_policy has been added. This flag only applies when redact is set to true. When set to ErrorRedactionPolicy.Strict, error redaction will behave as it has in the past. Setting this to ErrorRedactionPolicy.Extended will allow the extensions.code value from subgraph errors to pass through redaction and be sent to Studio.
  • A warning about incompatibility of error telemetry with connectors will be suppressed when this feature is enabled, since it does support connectors when using the new mode.

By @​timbotnik in https://github.com/apollographql/router/pull/6966

🐛 Fixes

Export gauge instruments (Issue #​6859)

Previously in router 2.x, when using the router's OTel meter_provider() to report metrics from Rust plugins, gauge instruments such as those created using .u64_gauge() weren't exported. The router now exports these instruments.

By @​yanns in https://github.com/apollographql/router/pull/6865

Use batch_processor config for Apollo metrics PeriodicReader (PR #​7024)

The Apollo OTLP batch_processor configurations telemetry.apollo.batch_processor.scheduled_delay and telemetry.apollo.batch_processor.max_export_timeout now also control the Apollo OTLP PeriodicReader export interval and timeout, respectively. This update brings parity between Apollo OTLP metrics and non-Apollo OTLP exporter metrics.

By @​rregitsky in https://github.com/apollographql/router/pull/7024

Reduce Brotli encoding compression level (Issue #​6857)

The Brotli encoding compression level has been changed from 11 to 4 to improve performance and mimic other compression algorithms' fast setting. This value is also a much more reasonable value for dynamic workloads.

By @​carodewig in https://github.com/apollographql/router/pull/7007

CPU count inference improvements for cgroup environments (PR #​6787)

This fixes an issue where the fleet_detector plugin would not correctly infer the CPU limits for a system which used cgroup or cgroup2.

By @​nmoutschen in https://github.com/apollographql/router/pull/6787

Separate entity keys and representation variables in entity cache key (Issue #​6673)

This fix separates the entity keys and representation variable values in the cache key, to avoid issues with @requires for example.

[!IMPORTANT]

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

By @​bnjjj in https://github.com/apollographql/router/pull/6888

Replace Rhai-specific hot-reload functionality with general hot-reload (PR #​6950)

In Router 2.0 the rhai hot-reload capability was not working. This was because of architectural improvements to the router which meant that the entire service stack was no longer re-created for each request.

The fix adds the rhai source files into the primary list of elements, configuration, schema, etc..., watched by the router and removes the old Rhai-specific file watching logic.

If --hot-reload is enabled, the router will reload on changes to Rhai source code just like it would for changes to configuration, for example.

By @​garypen in https://github.com/apollographql/router/pull/6950

📃 Configuration

Make experimental OTLP error metrics feature flag non-experimental (PR #​7033)

Because the OTLP error metrics feature is being promoted to preview from experimental, this change updates its feature flag name from experimental_otlp_error_metrics to preview_extended_error_metrics.

By @​merylc in https://github.com/apollographql/router/pull/7033

[!TIP] All notable changes to Router v2.x after its initial release will be documented in this file. To see previous history, see the changelog prior to v2.0.0.

v2.0.0

Compare Source

This is a major release of the router containing significant new functionality and improvements to behaviour, resulting in more predictable resource utilisation and decreased latency.

Router 2.0.0 introduces general availability of Apollo Connectors, helping integrate REST services in router deployments.

This entry summarizes the overall changes in 2.0.0. To learn more details, go to the What's New in router v2.x page.

To upgrade to this version, follow the upgrading from router 1.x to 2.x guide.

❗ BREAKING CHANGES ❗

In order to make structural improvements in the router and upgrade some of our key dependencies, some breaking changes were introduced in this major release. Most of the breaking changes are in the areas of configuration and observability. All details on what's been removed and changed can be found in the upgrade guide.

🚀 Features

Router 2.0.0 comes with many new features and improvements. While all the details can be found in the What's New guide, the following features are the ones we are most excited about.

Simplified integration of REST services using Apollo Connectors. Apollo Connectors are a declarative programming model for GraphQL, allowing you to plug your existing REST services directly into your graph. Once integrated, client developers gain all the benefits of GraphQL, and API owners gain all the benefits of GraphOS, including incorporation into a supergraph for a comprehensive, unified view of your organization's data and services. This detailed guide outlines how to configure connectors with the router. Moving from Connectors Preview can be accomplished by following the steps in the Connectors GA upgrade guide.

Predictable resource utilization and availability with back pressure. Back pressure was not maintained in router 1.x, which meant all requests were being accepted by the router. This resulted in issues for routers which are accepting high levels of traffic. Router 2.0.0 improves the handling of back pressure so that traffic shaping measures are more effective while also improving integration with telemetry. Improvements to back pressure then allows for significant improvements in traffic shaping, which improves router's ability to observe timeout and traffic shaping restrictions correctly. You can read about traffic shaping changes in this section of the upgrade guide.

Metrics now all follow OpenTelemetry naming conventions. Some of router's earlier metrics were created before the introduction of OpenTelemetry, resulting in naming inconsistencies. Along with standardising metrics to OpenTelemetry, Apollo operation usage reporting now also defaults to using OpenTelemetry in router 2.0.0. Quite a few existing metrics had to be changed in order to do this properly and correctly, and we encourage you to carefully read through the upgrade guide for all the metrics changes.

Improved validation of CORS configurations, preventing silent failures. While CORS configuration did not change in router 2.0.0, we did improve CORS value validation. This results in things like invalid regex or unknown allow_methods returning errors early and preventing starting the router.

Documentation for context keys, improving usability for advanced customers. Router 2.0.0 creates consistent naming semantics for request context keys, which are used to share data across internal router pipeline stages. If you are relying on context entries in rust plugins, rhai scripts, coprocessors, or telemetry selectors, please refer to this section to see what keys changed.

📃 Configuration

Some changes to router configuration options were necessary in this release. Descriptions for both breaking changes to previous configuration and configuration for new features can be found in the upgrade guide).

🛠 Maintenance

Many external Rust dependencies (crates) have been updated to modern versions where possible. As the Rust ecosystem evolves, so does the router. Keeping these crates up to date helps keep the router secure and stable.

Major upgrades in this version include:

  • axum
  • http
  • hyper
  • opentelemetry
  • redis

v1.61.1

Compare Source

🐛 Fixes
Use correct default values on omitted OTLP endpoints (PR #​6931)

Previously, when the configuration didn't specify an OTLP endpoint, the Router would always default to http://localhost:4318. However, port 4318 is the correct default only for the HTTP protocol, while port 4317 should be used for gRPC.

Additionally, all other telemetry defaults in the Router configuration consistently use 127.0.0.1 as the hostname rather than localhost.

With this change, the Router now uses:

  • http://127.0.0.1:4317 as the default for gRPC protocol
  • http://127.0.0.1:4318 as the default for HTTP protocol

This ensures protocol-appropriate port defaults and consistent hostname usage across all telemetry configurations.

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6931

Separate entity keys and representation variables in entity cache key (Issue #​6673)

This fix separates the entity keys and representation variable values in the cache key, to avoid issues with @requires for example.

By @​bnjjj in https://github.com/apollographql/router/pull/6888

🔒 Security
Add batching.maximum_size configuration option to limit maximum client batch size (PR #​7005)

Add an optional maximum_size parameter to the batching configuration.

  • When specified, the router will reject requests which contain more than maximum_size queries in the client batch.
  • When unspecified, the router performs no size checking (the current behavior).

If the number of queries provided exceeds the maximum batch size, the entire batch fails with error code 422 (Unprocessable Content). For example:

{
  "errors": [
    {
      "message": "Invalid GraphQL request",
      "extensions": {
        "details": "Batch limits exceeded: you provided a batch with 3 entries, but the configured maximum router batch size is 2",
        "code": "BATCH_LIMIT_EXCEEDED"
      }
    }
  ]
}

By @​carodewig in https://github.com/apollographql/router/pull/7005

🔍 Debuggability
Add apollo.router.pipelines metrics (PR #​6967)

When the router reloads, either via schema change or config change, a new request pipeline is created. Existing request pipelines are closed once their requests finish. However, this may not happen if there are ongoing long requests that do not finish, such as Subscriptions.

To enable debugging when request pipelines are being kept around, a new gauge metric has been added:

  • apollo.router.pipelines - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration

By @​BrynCooke in https://github.com/apollographql/router/pull/6967

Add apollo.router.open_connections metric (PR #​7023)

To help users to diagnose when connections are keeping pipelines hanging around, the following metric has been added:

  • apollo.router.open_connections - The number of request pipelines active in the router
    • schema.id - The Apollo Studio schema hash associated with the pipeline.
    • launch.id - The Apollo Studio launch id associated with the pipeline (optional).
    • config.hash - The hash of the configuration.
    • server.address - The address that the router is listening on.
    • server.port - The port that the router is listening on if not a unix socket.
    • state - Either active or terminating.

You can use this metric to monitor when connections are open via long running requests or keepalive messages.

By @​BrynCooke in https://github.com/apollographql/router/pull/7009

v1.61.0

Compare Source

🚀 Features
Query planner dry-run option (PR #​6656)

This PR adds a new dry-run option to the Apollo-Expose-Query-Plan header value that emits the query plans back to Studio for visualizations. This new value will only emit the query plan, and abort execution. This can be helpful for tools like rover, where query plan generation is needed but not full runtime, or for potentially prewarming query plan caches out of band.

curl --request POST --include \
     --header 'Accept: application/json' \
     --header 'Apollo-Expose-Query-Plan: dry-run' \
     --url 'http://127.0.0.1:4000/' \
     --data '{"query": "{ topProducts { upc name } }"}'

By @​aaronArinder and @​lennyburdette in https://github.com/apollographql/router/pull/6656.

Enable Remote Proxy Downloads

This enables users without direct download access to specify a remote proxy mirror location for the github download of the Apollo Router releases.

By @​LongLiveCHIEF in https://github.com/apollographql/router/pull/6667

🐛 Fixes
Header propagation rules passthrough (PR #​6690)

Header propagation contains logic to prevent headers from being propagated more than once. This was broken in https://github.com/apollographql/router/pull/6281 which always considered a header propagated regardless if a rule actually matched.

This PR alters the logic so that a header is marked as fixed only when it's populated.

The following will now work again:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b

Note that defaulting a header WILL populate it, so make sure to include your defaults last in your propagation rules.

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
          default: defaulted # This will prevent any further rule evaluation for header `b`
      - propagate:
          named: b

Instead, make sure that your headers are defaulted last:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b
          default: defaulted # OK

By @​BrynCooke in https://github.com/apollographql/router/pull/6690

Entity cache: fix directive conflicts in cache-control header (Issue #​6441)

Unnecessary cache-control directives are created in cache-control header. The router will now filter out unnecessary values from the cache-control header when the request resolves. So if there's max-age=10, no-cache, must-revalidate, no-store, the expected value for the cache-control header would simply be no-store. Please see the MDN docs for justification of this reasoning: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#preventing_storing

By @​bnjjj in https://github.com/apollographql/router/pull/6543

Query Planning: fix __typename selections in sibling typename optimization

The query planner uses an optimization technique called "sibling typename", which attaches __typename selections to their sibling selections so the planner won't need to plan them separately.

Previously, when there were multiple identical selections and one of them has a __typename attached, the query planner could pick the one without the attachment, effectively losing a __typename selection.

Now, the query planner favors the one with a __typename attached without losing the __typename selection.

By @​duckki in https://github.com/apollographql/router/pull/6824

📃 Configuration
Promote experimental_otlp_tracing_sampler config to stable (PR #​6070)

The router's otlp tracing sampler feature that was previously experimental is now generally available.

If you used its experimental configuration, you should migrate to the new configuration option:

  • telemetry.apollo.experimental_otlp_tracing_sampler is now telemetry.apollo.otlp_tracing_sampler

The experimental configuration option is now deprecated. It remains functional but will log warnings.

By @​garypen in https://github.com/apollographql/router/pull/6070

Promote experimental_local_manifess config for persisted queries to stable

The experimental_local_manifests PQ configuration option is being promoted to stable. This change updates the configuration option name and any references to it, as well as the related documentation. The experimental_ usage remains valid as an alias for existing usages.

By @​trevor-scheer in https://github.com/apollographql/router/pull/6564

🛠 Maintenance
Reduce demand control allocations on start/reload (PR #​6754)

When demand control is enabled, the router now preallocates capacity for demand control's processed schema and shrinks to fit after processing. When it's disabled, the router skips the type processing entirely to minimize startup impact.

By @​tninesling in https://github.com/apollographql/router/pull/6754

v1.60.1

Compare Source

🐛 Fixes
Header propagation rules passthrough (PR #​6690)

Header propagation contains logic to prevent headers from being propagated more than once. This was broken in https://github.com/apollographql/router/pull/6281 which always considered a header propagated regardless if a rule actually matched.

This PR alters the logic so that only when a header is populated then the header is marked as fixed.

The following will now work again:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b

Note that defaulting a head WILL populate a header, so make sure to include your defaults last in your propagation rules.

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
          default: defaulted # This will prevent any further rule evaluation for header `b`
      - propagate:
          named: b

Instead, make sure that your headers are defaulted last:

headers:
  all:
    request:
      - propagate:
          named: a
          rename: b
      - propagate:
          named: b
          default: defaulted # OK

By @​BrynCooke in https://github.com/apollographql/router/pull/6690

Entity cache: fix directive conflicts in cache-control header (Issue #​6441)

Unnecessary cache-control directives are created in cache-control header. The router will now filter out unnecessary values from the cache-control header when the request resolves. So if there's max-age=10, no-cache, must-revalidate, no-store, the expected value for the cache-control header would simply be no-store. Please see the MDN docs for justification of this reasoning: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#preventing_storing

By @​bnjjj in https://github.com/apollographql/router/pull/6543

Resolve regressions in fragment compression for certain operations (PR #​6651)

In v1.58.0 we introduced a new compression strategy for subgraph GraphQL operations to replace an older, more complicated algorithm.

While we were able to validate improvements for a majority of cases, some regressions still surfaced. To address this, we are extending it to compress more operations with the following outcomes:

  • The P99 overhead of running the new compression algorithm on the largest operations in our corpus is now just 10ms
  • In case of better compression, at P99 it shrinks the operations by 50Kb when compared to the old algorithm
  • In case of worse compression, at P99 it only adds an additional 108 bytes compared to the old algorithm, which was an acceptable trade-off versus added complexity

By @​dariuszkuc in https://github.com/apollographql/router/pull/6651

v1.60.0

Compare Source

🚀 Features
Improve BatchProcessor observability (Issue #​6558)

A new metric has been introduced to allow observation of how many spans are being dropped by an telemetry batch processor.

  • apollo.router.telemetry.batch_processor.errors - The number of errors encountered by exporter batch processors.
    • name: One of apollo-tracing, datadog-tracing, jaeger-collector, otlp-tracing, zipkin-tracing.
    • error = One of channel closed, channel full.

By observing the number of spans dropped it is possible to estimate what batch processor settings will work for you.

In addition, the log message for dropped spans will now indicate which batch processor is affected.

By @​bryncooke in https://github.com/apollographql/router/pull/6558

🐛 Fixes
Improve performance of query hashing by using a precomputed schema hash (PR #​6622)

The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.

For more details on why these design decisions were made, please see the PR description

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6622

Truncate invalid error paths (PR #​6359)

This fix addresses an issue where the router was silently dropping subgraph errors that included invalid paths.

According to the GraphQL Specification an error path must point to a response field:

If an error can be associated to a particular field in the GraphQL result, it must contain an entry with the key path that details the path of the response field which experienced the error.

The router now truncates the path to the nearest valid field path if a subgraph error includes a path that can't be matched to a response field,

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6359

Eagerly init subgraph operation for subscription primary nodes (PR #​6509)

When subgraph operations are deserialized, typically from a query plan cache, they are not automatically parsed into a full document. Instead, each node needs to initialize its operation(s) prior to execution. With this change, the primary node inside SubscriptionNode is initialized in the same way as other nodes in the plan.

By @​tninesling in https://github.com/apollographql/router/pull/6509

Fix increased memory usage in sysinfo since Router 1.59.0 (PR #​6634)

In version 1.59.0, Apollo Router started using the sysinfo crate to gather metrics about available CPUs and RAM. By default, that crate uses rayon internally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.

In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.

This regression is now fixed by:

  • Disabling sysinfo’s use of rayon, so the thread pool is not created and system processes information is gathered in a sequential loop.
  • Making sysinfo not gather that information in the first place since Router does not use it.

By @​SimonSapin in https://github.com/apollographql/router/pull/6634

Optimize demand control lookup (PR #​6450)

The performance of demand control in the router has been optimized.

Previously, demand control could reduce router throughput due to its extra processing required for scoring.

This fix improves performance by shifting more data to be computed at plugin initialization and consolidating lookup queries:

  • Cost directives for arguments are now stored in a map alongside those for field definitions
  • All precomputed directives are bundled into a struct for each field, along with that field's extended schema type. This reduces 5 individual lookups to a single lookup.
  • Response scoring was looking up each field's definition twice. This is now reduced to a single lookup.

By @​tninesling in https://github.com/apollographql/router/pull/6450

Fix missing Content-Length header in subgraph requests (Issue #​6503)

A change in 1.59.0 caused the Router to send requests to subgraphs without a Content-Length header, which would cause issues with some GraphQL servers that depend on that header.

This solves the underlying bug and reintroduces the Content-Length header.

By @​nmoutschen in https://github.com/apollographql/router/pull/6538

🛠 Maintenance
Remove the legacy query planner (PR #​6418)

The legacy query planner has been removed in this release. In the previous release, router v1.58, it was no longer used by default but was still available through the experimental_query_planner_mode configuration key. That key is now removed.

Also removed are configuration keys which were only relevant to the legacy planner:

  • supergraph.query_planning.experimental_parallelism: the new planner can always use available parallelism.
  • supergraph.experimental_reuse_query_fragments: this experimental algorithm that attempted to reuse fragments from the original operation while forming subgraph requests is no longer present. Instead, by default new fragment definitions are generated based on the shape of the subgraph operation.

By @​SimonSapin in https://github.com/apollographql/router/pull/6418

Migrate various metrics to OTel instruments (PR #​6476, PR #​6356, PR #​6539)

Various metrics using our legacy mechanism based on the tracing crate are migrated to OTel instruments.

By @​goto-bus-stop in https://github.com/apollographql/router/pull/6476, https://github.com/apollographql/router/pull/6356, https://github.com/apollographql/router/pull/6539

📚 Documentation
Add instrumentation configuration examples (PR #​6487)

The docs for router telemetry have new example configurations for common use cases for selectors and condition.

By @​shorgi in https://github.com/apollographql/router/pull/6487

🧪 Experimental
Remove experimental_retry option (PR #​6338)

The experimental_retry option has been removed due to its limited use and functionality during its experimental phase.

By @​bnjjj in https://github.com/apollographql/router/pull/6338

v1.59.2

Compare Source

[!IMPORTANT]

This release contains important fixes which address resource utilization regressions which impacted Router v1.59.0 and v1.59.1. These regressions were in the form of:

  1. A small baseline increase in memory usage; AND
  2. Additional per-request CPU and memory usage for queries which included references to abstract types with a large number of implementations

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🐛 Fixes
Improve performance of query hashing by using a precomputed schema hash (PR #​6622)

The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.

For more details on why these design decisions were made, please see the PR description

By @​IvanGoncharov in https://github.com/apollographql/router/pull/6622

Fix increased memory usage in sysinfo since Router 1.59.0 (PR #​6634)

In version 1.59.0, Apollo Router started using the sysinfo crate to gather metrics about available CPUs and RAM. By default, that crate uses rayon internally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.

In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.

This regression is now fixed by:

  • Disabling sysinfo’s use of rayon, so the thread pool is not created and system processes information is gathered in a sequential loop.
  • Making sysinfo not gather that information in the first place since Router does not use it.

By @​SimonSapin in https://github.com/apollographql/router/pull/6634

v1.59.1

Compare Source

[!IMPORTANT]

This release was impacted by a resource utilization regression which was fixed in v1.59.2. See the release notes for that release for more details. As a result, we recommend using v1.59.2 rather than v1.59.1 or v1.59.0.

🐛 Fixes
Fix transmitted header value for Datadog priority sampling resolution (PR #​6017)

The router now transmits correct values of x-datadog-sampling-priority to downstream services.

Previously, an x-datadog-sampling-priority of -1 was incorrectly converted to 0 for downstream requests, and 2 was incorrectly converted to 1. When propagating to downstream services, this resulted in values of USER_REJECT being incorrectly transmitted as AUTO_REJECT.

Enable accurate Datadog APM metrics (PR #​6017)

The router supports a new preview feature, the preview_datadog_agent_sampling option, to enable sending all spans to the Datadog Agent so APM metrics and views are accurate.

Previously, the sampler option in telemetry.exporters.tracing.common.sampler wasn't Datadog-aware. To get accurate Datadog APM metrics, all spans must be sent to the Datadog Agent with a psr or sampling.priority attribute set appropriately to record the sampling decision.

The preview_datadog_agent_sampling option enables accurate Datadog APM metrics. It should be used when exporting to the Datadog Agent, via OTLP or Datadog-native.

telemetry:
  exporters:
    tracing:
      common:

##### Only 10 percent of spans will be forwarded from the Datadog agent to Datadog. Experiment to find a value that is good for you!
        sampler: 0.1

##### Send all spans to the Datadog agent.
        preview_datadog_agent_sampling: true

Using these options can decrease your Datadog bill, because you will be sending only a percentage of spans from the Datadog Agent to Datadog.

[!IMPORTANT]

  • Users must enable preview_datadog_agent_sampling to get accurate APM metrics. Users that have been using recent versions of the router will have to modify their configuration to retain full APM metrics.
  • The router doesn't support in-agent ingestion control.
  • Configuring traces_per_second in the Datadog Agent won't dynamically adjust the router's sampling rate to meet the target rate.
  • Sending all spans to the Datadog Agent may require that you tweak the batch_processor settings in your exporter config. This applies to both OTLP and Datadog native exporters.

Learn more by reading the updated Datadog tracing documentation for more information on configuration options and their implications.

Fix non-parent sampling (PR #​6481)

When the user specifies a non-parent sampler the router should ignore the information from upstream and use its own sampling rate.

The following configuration would not work correctly:

  exporters:
    tracing:
      common:
        service_name: router
        sampler: 0.00001
        parent_based_sampler: false

All spans are being sampled. This is now fixed and the router will correctly ignore any upstream sampling decision.

By @​BrynCooke in https://github.com/apollographql/router/pull/6481

v1.59.0

Compare Source

[!IMPORTANT] Router version 1.53.0 through to 1.59.0 have an issue where users of the Datadog exporter will see all traces sampled at 100%. This is due to the Router incorrectly setting the priority sampled flag on spans 100% of the time. This will cause all traces that are sent to Datadog agent to be forwarded on to Datadog, potentially incurring costs.

Update to 1.59.1 to resolve this issue. Datadog users may wish to enable preview_datadog_agent_sampling to enable accurate APM metrics.

[!IMPORTANT]

This release was impacted by a resource utilization regression which was fixed in v1.59.2. See the release notes for that release for more details. As a result, we recommend using v1.59.2 rather than v1.59.1 or v1.59.0.

[!IMPORTANT] If you have enabled distributed query plan caching, updates to the query planner in this release will result in query plan caches being regenerated rather than reused. On account of this, you should anticipate additional cache regeneration cost when updating to this router version while the new query plans come into service.

🚀 Features
General availability of native query planner

The router's native, Rust-based, query planner is now generally available and enabled by default.

The native query planner achieves better performance for a variety of graphs. In our tests, we observe:

  • 10x median improvement in query planning time (observed via apollo.router.query_planning.plan.duration)
  • 2.9x improvement in router’s CPU utilization
  • 2.2x improvement in router’s memory usage

Note: you can expect generated plans and subgraph operations in the native query planner to have slight differences when compared to the legacy, JavaScript-based query planner. We've ascertained these differences to be semantically insignificant, based on comparing ~2.5 million known unique user operations in GraphOS as well as comparing ~630 million operations across actual router deployments in shadow mode for a four month duration.

The native query planner supports Federation v2 supergraphs. If you are using Federation v1 today, see our migration guide on how to update your composition build step. Subgraph changes are typically not needed.

The legacy, JavaScript, query planner is deprecated in this release, but you can still switch back to it if you are still using Federation v1 supergraph:

experimental_query_planner_mode: legacy

Note: The subgraph operations generated by the query planner are not guaranteed consistent release over release. We strongly recommend against relying on the shape of planned subgraph operations, as new router features and optimizations will continuously affect it.

By @​sachindshinde, @​goto-bus-stop, @​duckki, @​TylerBloom, @​SimonSapin, @​dariuszkuc, @​lrlna, @​clenfest, and @​o0Ignition0o.

Ability to skip persisted query list safelisting enforcement via plugin (PR #​6403)

If safelisting is enabled, a router_service plugin can skip enforcement of the safelist (including the require_id check) by adding the key apollo_persisted_queries::safelist::skip_enforcement with value true to the request context.

Note: this doesn't affect the logging of unknown operations by the persisted_queries.log_unknown option.

In cases where an operation would have been denied but is allowed due to the context key existing, the attribute persisted_queries.safelist.enforcement_skipped is set on the apollo.router.operations.persisted_queries metric with value true.

By @​glasser in https://github.com/apollographql/router/pull/6403

Add fleet awareness plugin (PR #​6151)

A new fleet_awareness plugin has been added that reports telemetry to Apollo about the configuration and deployment of the router.

The reported telemetry include CPU and memory usage, CPU frequency, and other deployment characteristics such as operating system and cloud provider. For more details, along with a full list of data captured and how to opt out, go to our data privacy policy.

By @​jonathanrainer, [@​nmoutschen](https://redirect.github.com/n


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • [ ] If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate[bot] avatar Mar 16 '25 13:03 renovate[bot]