router
router copied to clipboard
Router 2.2.1 version throws this telemetry error which is unknown
2025-05-07T15:12:01.905848Z ERROR resource{service.namespace="rh-graphql",service.version="2.2.1-rc.1",service.name="rhg-router",process.executable.name="router",} tokio-runtime-worker ThreadId(10) apollo_router::plugins::telemetry::error_handler: apollo-router/src/plugins/telemetry/error_handler.rs:92 OpenTelemetry metric error occurred: Metrics exporter otlp failed with the grpc server returns error (Unknown error): , detailed error message: h2 protocol error: http2 error tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", FRAME_SIZE_ERROR, Library) }))
https://www.apollographql.com/docs/graphos/routing/errors
in the errors FRAME_SIZE_ERROR is not documented.
Router throws this log with 2.2.1 version for telemetry
You didn't specify if you saw this with Router 2.2.0, or any other 2.x version, but I don't think this is in any way specific to Router 2.2.1.
To understand what's going on here though, we'd need to know specifically what your configuration is, and particularly the OTLP metrics configuration — what is your metrics OTLP endpoint string? You can redact the hostname, but knowing the full value you have set would be useful. (these are in telemetry.exporters in your config). From there, our next question will likely be what the telemetry endpoint is configured to be.
Overall, FRAME_SIZE_ERROR is an HTTP/2 error, and I'm pretty sure it's being returned by whatever endpoint you have that configured to be — not by the Router, you're just seeing the error in your logs. We don't document all HTTP/2 errors in our docs, but it's probably a protocol negotiation failure where the endpoint isn't quite figuring out how to finish the negotiation. Various resources come up in a search, but I think we should zoom in on the value of the endpoint and whether you need to specify a correct protocol in your configuration. OpenTelemetry has changed its mind over the years around how port 4317 and 4318 work, and it's probably something that once worked and changed once the OpenTelemetry standard normalized, which has happened during the life-time of the Router.
telemetry:
exporters:
metrics:
otlp:
enabled: true
endpoint: "${env.OTEL_EXPORTER_OTLP_ENDPOINT}"
protocol: http
common:
service_name: "${env.OTEL_SERVICE_NAME}"
service_namespace: rh-graphql
resource:
service.name: "${env.OTEL_SERVICE_NAME}"
service_namespace: rh-graphql
logging:
common:
service_name: "${env.OTEL_SERVICE_NAME}"
service_namespace: rh-graphql
resource:
service.name: "${env.OTEL_SERVICE_NAME}"
service_namespace: rh-graphql
stdout:
enabled: true
format:
text:
ansi_escape_codes: true
display_current_span: true
display_filename: true
display_level: true
display_line_number: true
display_resource: true
display_service_name: true
display_service_namespace: true
display_span_id: true
display_trace_id: true
display_span_list: true
display_target: true
display_thread_id: true
display_thread_name: true
display_timestamp: true
tty_format:
text:
ansi_escape_codes: true
display_current_span: true
display_filename: true
display_level: true
display_line_number: true
display_resource: true
display_service_name: true
display_service_namespace: true
display_span_id: true
display_trace_id: true
display_span_list: true
display_target: true
display_thread_id: true
display_thread_name: true
display_timestamp: true
tracing:
common:
service_name: "${env.OTEL_SERVICE_NAME}"
otlp:
enabled: true
endpoint: "${env.OTEL_EXPORTER_OTLP_ENDPOINT}"
protocol: http
experimental_response_trace_id:
enabled: true
format: hexadecimal
header_name: rhg-trace-id
apollo:
buffer_size: 10000
client_name_header: apollographql-client-name
client_version_header: apollographql-client-version
endpoint: "https://usage-reporting.api.apollographql.com/api/ingress/traces"
experimental_local_field_metrics: false
experimental_otlp_endpoint: "https://usage-reporting.api.apollographql.com/"
send_variable_values: none
send_headers:
except:
- Authorization
instrumentation:
spans:
## Set the mode for spans to be specification compliant.
mode: spec_compliant
default_attribute_requirement_level: required
supergraph:
attributes:
cost.result:
alias: rhg-query-cost-result
cost.actual:
alias: rhg-query-cost-actual
cost.estimated:
alias: rhg-query-cost-estimated
cost.delta:
alias: rhg-query-cost-delta
graphql.document:
alias: rhg-graphql-query
graphql.operation.name:
alias: rhg-graphql-operation-name
graphql.operation.type:
alias: rhg-graphql-operation-type
router:
attributes:
baggage: true
subgraph:
attributes:
subgraph.name: true
subgraph.graphql.document: true
subgraph.graphql.operation.name: true
subgraph.graphql.operation.type: true
http.request.resend_count: true
instruments:
default_requirement_level: required
router:
http.server.active_requests: true
http.server.request.body.size: true
http.server.request.duration: true
http.server.response.body.size: true
subgraph:
http.client.request.body.size: true
http.client.request.duration: true
http.client.response.body.size: true
connector:
http.client.request.body.size: true
http.client.request.duration: true
http.client.response.body.size: true
graphql:
field.execution: true
list.length: true
cache:
apollo.router.operations.entity.cache: true
supergraph:
cost.actual: true
cost.delta: true
cost.estimated: true
events:
supergraph:
COST_ACTUAL_TOO_EXPENSIVE:
message: "cost actual is high"
on: event_response
level: error
condition:
gt:
- cost: actual
- 250000
attributes:
graphql.operation.name: true
cost.actual: true
this is my telemetry configurarion
in 1.x version everything works fine
what does OTEL_EXPORTER_OTLP_ENDPOINT look like? what's the scheme? (is it specified or is it missing)? what's the port? what's the path?
can you provide a docker compose with a repo?
docker-compose file
version: '3.8'
services:
# Jaeger service for OpenTelemetry tracing
jaeger:
image: jaegertracing/all-in-one:1.35
container_name: jaeger
environment:
- COLLECTOR_OTLP_ENABLED=true
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "5778:5778" # Configuration API
- "9411:9411" # Zipkin compatible endpoint
networks:
- apollo-network
# Apollo Router service
apollo-router:
build:
context: . # Build the Apollo Router from the Dockerfile in the current directory
dockerfile: Dockerfile
container_name: apollo-router
ports:
- "4000:4000" # Apollo Router default port
env_file:
- docker.env
depends_on:
- jaeger # Ensure Jaeger starts before the Apollo Router
networks:
- apollo-network
# Define a custom network for communication between services
networks:
apollo-network:
driver: bridge
OTEL_EXPORTER_OTLP_GRPC_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
When OTEL_EXPORTER_OTLP_GRPC_ENDPOINT is set to http://localhost:4317/ (for OTLP/gRPC trace collection) and OTEL_EXPORTER_OTLP_ENDPOINT is also set to http://localhost:4318/, issues arise. These problems specifically occur even if the OTEL_EXPORTER_OTLP_ENDPOINT variable is not actively used for data export.
For us in our openshift envs OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_GRPC_ENDPOINT envs are set by default by IT. router reads two env variables by default and throws error
I have next error
OpenTelemetry metric error occurred: Metrics exporter otlp failed with the grpc server returns error (Unknown error): , detailed error message: transport error tonic::transport::Error(Transport, hyper::Error(Io, Kind(ConnectionReset)))
but exporting works fine. I wonder if it is possible to configure logging to skip this error. What should I use to do that?
I tried RUST_LOG=tonic::transport/ConnectionReset but doesn't seem to work.