router icon indicating copy to clipboard operation
router copied to clipboard

OpenTelemetry trace error

Open deweyjose opened this issue 3 years ago • 7 comments

Describe the bug We're seeing the following error in our Router logs:

{"timestamp":"2022-11-07T14:34:48.904971Z","level":"ERROR","fields":{"message":"OpenTelemetry trace error occurred: error sending request for url (*********/traces): connection closed before message completed"},"target":"apollo_router::plugins::telemetry"}

deweyjose avatar Nov 07 '22 18:11 deweyjose

Thanks for opening this issue! This could be tied to #2046

o0Ignition0o avatar Nov 07 '22 18:11 o0Ignition0o

I'm seeing a similar issue, but may not be related:

{"timestamp":"2022-11-08T19:24:19.723074Z","level":"ERROR","fields":{"message":"OpenTelemetry trace error occurred: oneshot canceled"},"target":"apollo_router::plugins::telemetry"}

Running v1.2.1

kmcrawford avatar Nov 08 '22 19:11 kmcrawford

Not sure about the oneshot, but the other issue is likely to be caused by the BatchSpanProcessor defaults not being good for production.

In the short term the following env variables are available to tweak: https://opentelemetry.io/docs/reference/specification/sdk-environment-variables/#batch-span-processor

Longer term we are looking to add this configuration to the router.yaml

BrynCooke avatar Nov 16 '22 08:11 BrynCooke

Going to close this for now in favor of #1789 as I think this is resolved by tweaking the batch span processor settings. If it still persists then feel free to reopen.

BrynCooke avatar Nov 18 '22 11:11 BrynCooke

This issue is till occurring in v1.37.0 (#1789 appears to be a different issue; notice the error message is different).

Here are the relevant error messages:

OpenTelemetry trace error occurred: error sending request for url (http://localhost:8126/v0.5/traces): connection closed before message completed
OpenTelemetry trace error occurred: error sending request for url (http://localhost:8126/v0.5/traces): connection error: Connection reset by peer (os error 104)

Here's my telemetry config (notice the errors are related to the datadog tracer):

telemetry:
  exporters:
    metrics:
      common:
        service_name: ${env.DD_SERVICE:-graphql-router}
      prometheus:
        enabled: true
        listen: 0.0.0.0:9090
        path: /metrics
    tracing:
      common:
        service_name: ${env.DD_SERVICE:-graphql-router}
      datadog:
        enabled: true
        endpoint: ${env.DD_ENDPOINT}
        enable_span_mapping: true

Could be related to https://github.com/open-telemetry/opentelemetry-rust-contrib/issues/7

jtribble avatar Jan 05 '24 21:01 jtribble

reopening as it is apparently still not resolved and we are getting more reports of it. Apparently the Datadog agent is correctly receiving the trace, but abruptly closing the connection. Other libraries have workarounds for it: https://github.com/will-bank/datadog-tracing/blob/30cdfba8d00caa04f6ac8e304f76403a5eb97129/src/tracer.rs#L29

Geal avatar Mar 12 '24 16:03 Geal

I had a go at this and although better it still produced occasional errors in the logs.

BrynCooke avatar Jun 25 '24 20:06 BrynCooke