opentelemetry-go icon indicating copy to clipboard operation
opentelemetry-go copied to clipboard

Jaeger exporter complaining "span too large to send"

Open Baliedge opened this issue 2 years ago • 2 comments

Description

Some applications are dropping trace spans with errors like "multiple errors during transform" and "span too large to send". I suspect this may be a consequence of a workaround in #2663 that limits MaxPacketSize to 1472 so that Thrift packets can be sent over the wire. These affected Thrift packets are dropped due to this error resulting in a missing or incomplete trace in Jaeger UI.

I suspect this reduced size is also limiting the max span payload as it doesn't appear to further subdivide the content.

An example error scenario where the span contains some attributes and logs, but nothing excessive, IMHO.

2022/03/21 15:15:17 multiple errors during transform: span too large to send: Span({TraceIdLow:-1959931804685451170 TraceIdHigh:9142211311457415581 SpanId:3501129510158026876 ParentSpanId:-3699454375769775617 ... [very long dump truncated by me]

This was unreadable, so I pulled the source and added a JSON dump. Here's one such result from within the "multiple errors during transform" error:

span too large to send:{
  "traceIdLow": 8238679333188071647,
  "traceIdHigh": 6973323051039728651,
  "spanId": 1844865607382183757,
  "parentSpanId": -1272813624660893053,
  "operationName": "github.com/mailgun/turret/v2/client/golang.(*transaction).Close",
  "flags": 1,
  "startTime": 1647890117472575,
  "duration": 7196,
  "tags": [
    {
      "key": "file",
      "vType": "STRING",
      "vStr": "/Users/SPoulson/src/turret/client/golang/client.go:228"
    },
    {
      "key": "otel.library.name",
      "vType": "STRING",
      "vStr": "github.com/mailgun/turret/v2"
    },
    {
      "key": "error",
      "vType": "BOOL",
      "vBool": true
    },
    {
      "key": "otel.status_code",
      "vType": "STRING",
      "vStr": "ERROR"
    },
    {
      "key": "otel.status_description",
      "vType": "STRING",
      "vStr": "code:250  message:\"OK\"  utf8_enabled:true  mx_host:\"10.5.0.2\"  secure:true  smtp_log:\"19:15:17.475      0s \u003e- {19:15
:17.468, #0, 0}\\n19:15:17.476      0s ** age=8.9931005s, sessionCount=9\\n19:15:17.476      0s \u003c- {#0}\\n19:15:17.476      0s -\u003e M
AIL FROM:\[email protected]\u003e BODY=8BITMIME SMTPUTF8\\n19:15:17.477     1ms -\u003c 250 Sender address accepted\\n19:15:17.477
1ms -\u003e RCPT TO:\[email protected]\u003e\\n19:15:17.478     2ms -\u003c 250 Recipient address accepted\\n19:15:17.478     2ms -\
u003e DATA\\n19:15:17.479     3ms -\u003c 354 Continue\\n19:15:17.480     4ms \u003e- {19:15:17.472, #1, 18}\\n19:15:17.480     4ms \u003e- {
19:15:17.472, #2, 0, last}\\n19:15:17.481     5ms \u003c- {#1}\\n19:15:17.483     7ms -\u003c 250 Great success\\n19:15:17.484     8ms \u003c
- {#2, last}\\n\"  mx_host_ip:\"10.5.0.2\"  tls_version:772  tls_cipher_suite:4865"
    }
  ],
  "logs": [
    {
      "timestamp": 1647890117479772,
      "fields": [
        {
          "key": "event",
          "vType": "STRING",
          "vStr": "exception"
        },
        {
          "key": "exception.type",
          "vType": "STRING",
          "vStr": "*errors.fundamental"
        },
        {
          "key": "exception.message",
          "vType": "STRING",
          "vStr": "code:250  message:\"OK\"  utf8_enabled:true  mx_host:\"10.5.0.2\"  secure:true  smtp_log:\"19:15:17.475      0s \u003e- {19:15:17.468, #0, 0}\\n19:15:17.476      0s ** age=8.9931005s, sessionCount=9\\n19:15:17.476      0s \u003c- {#0}\\n19:15:17.476      0s -\u003e MAIL FROM:\[email protected]\u003e BODY=8BITMIME SMTPUTF8\\n19:15:17.477     1ms -\u003c 250 Sender address accepted\\n19:15:17.477     1ms -\u003e RCPT TO:\[email protected]\u003e\\n19:15:17.478     2ms -\u003c 250 Recipient address accepted\\n19:15:17.478     2ms -\u003e DATA\\n19:15:17.479     3ms -\u003c 354 Continue\\n19:15:17.480     4ms \u003e- {19:15:17.472, #1, 18}\\n19:15:17.480     4ms \u003e- {19:15:17.472, #2, 0, last}\\n19:15:17.481     5ms \u003c- {#1}\\n19:15:17.483     7ms -\u003c 250 Great success\\n19:15:17.484     8ms \u003c- {#2, last}\\n\"  mx_host_ip:\"10.5.0.2\"  tls_version:772  tls_cipher_suite:4865"
        }
      ]
    }
  ]
}

Environment

  • OS: Linux and MacOS 12.2
  • Architecture: amd64
  • Go Version: 1.17
  • opentelemetry-go version: 1.4.1

Steps To Reproduce

  1. Checkout repo: https://github.com/Baliedge/otel-span-too-large
  2. Run make run JAEGER_AGENT_HOST=<your_jaeger_agent>
  3. See error described above.
  4. Compare with make run JAEGER_AGENT_HOST=localhost, which does not generate this error.

Expected behavior

Expect trace in Jaeger UI to display all span details generated by code.

Baliedge avatar Mar 21 '22 20:03 Baliedge

Additionally, I ask the devs if remote tracing is a supported configuration with MTU 1500?

Baliedge avatar Mar 22 '22 13:03 Baliedge

FWIW, I have moved to HTTP exporter. I've found that OTel doesn't seem to implement the full env var spec to configure this (i.e. OTEL_EXPORTER_JAEGER_PROTOCOL). But, it works if configured programmatically like:

exporter, err := jaeger.New(jaeger.WithCollectorEndpoint())

This will talk to Jaeger collector directly at the URL in env var OTEL_EXPORTER_JAEGER_ENDPOINT. e.g. http://jaeger.example.com:14268/api/traces

This bypasses the Jaeger agent and eliminates the payload bottleneck of UDP. In exchange, it seems to task to application with being its own agent.

Baliedge avatar Jul 14 '22 13:07 Baliedge

@MrAlias I would like to work on this. Could you please provide me any entry point in Jaeger code from where I can start ?

shubham-bansal96 avatar Apr 04 '23 13:04 shubham-bansal96

I should point out that the OTel Jaeger exporter is now deprecated in favor of OTLP exporter. You can continue to use Jaeger Tracing server because it now supports OTLP protocol.

Upon learning this, I've moved my projects to OTLP and have had no issues since.

https://opentelemetry.io/blog/2022/jaeger-native-otlp/

Baliedge avatar Apr 04 '23 18:04 Baliedge

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md#jaeger-exporter

Jaeger exporter support will be removed from OpenTelemetry in July 2023.

It is better to move to use OTLP as @Baliedge suggests.

pellared avatar Apr 04 '23 18:04 pellared

Thanks @pellared @Baliedge I believe no changes are required in code for this issue. If so, Could you please close this issue ?

shubham-bansal96 avatar Apr 05 '23 08:04 shubham-bansal96