opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

batchprocessor: send_batch_max_size_bytes limit

Open jangaraj opened this issue 3 years ago • 9 comments

Is your feature request related to a problem? Please describe. Golang GRPC server has default message limit 4MB. Batch processor can generate bigger message size, so receiver will reject batch and whole batch can be dropped:

"msg": "Exporting failed. The error is not retryable. Dropping data.",
"kind": "exporter",
"data_type": "traces",
"name": "otlp",
"error": "Permanent error: rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (5297928 vs. 4194304)",
"dropped_items": 4725,

Current batchprocessor config options doesn't provide opportunity to prevent this situation, because they works only with span counts, but not with whole batch size. send_batch_max_size is also count of spans.

Describe the solution you'd like New config option send_batch_max_size_bytes (maybe there can be better name), where will be defined default GRPC 4MB size (4194304), which will ensure that batch won't exceed this size.

Describe alternatives you've considered At the moment user can customize send_batch_size/send_batch_max_size, but in theory there can be a few traces with huge spans (e.g. Java backtraces with logs) and default 4MB grpc message limit can be exceeded. Maybe OTLP exporter may handle this message limitation.

jangaraj avatar Sep 10 '22 07:09 jangaraj

:+1: for this feature. We're currently doing some trial and error to figure out the right balance of send_batch_size and send_batch_max_size and hoping it stays under 4MB but having a guarantee would definitely be preferred.

evandam avatar Oct 13 '22 21:10 evandam

@evandam I made some recommendations here https://github.com/monitoringartist/opentelemetry-trace-pipeline-poisoning#mitigation-of-huge-4mb-trace

jangaraj avatar Oct 13 '22 21:10 jangaraj

Nice link, thank you! It definitely still relies on some back-of-the-envelope math which is bound to be wrong sooner or later, and it would be great to have an easy way to do this at the exporter/collector level.

evandam avatar Oct 14 '22 16:10 evandam

The size-based batching will only work if the processor is being used with OTLP exporter, but other exporters will have different batch sizes due to different encoding. I believe if we go with https://github.com/open-telemetry/opentelemetry-collector/issues/4646, we should be able to provided this for any exporter

dmitryax avatar Apr 05 '23 18:04 dmitryax

Describe alternatives you've considered At the moment user can customize send_batch_size/send_batch_max_size, but in theory there can be a few traces with huge spans (e.g. Java backtraces with logs) and default 4MB grpc message limit can be exceeded. Maybe OTLP exporter may handle this message limitation.

For those willing to configure a different amount of memory to be allocated for each GRPC message on the downstream OTLP Collectors' Receiver config, there also is the max_recv_msg_size_mib option.

https://github.com/open-telemetry/opentelemetry-collector/issues/1122#issuecomment-1765478663

cwegener avatar Oct 17 '23 03:10 cwegener

Describe alternatives you've considered At the moment user can customize send_batch_size/send_batch_max_size, but in theory there can be a few traces with huge spans (e.g. Java backtraces with logs) and default 4MB grpc message limit can be exceeded. Maybe OTLP exporter may handle this message limitation.

For those willing to configure a different amount of memory to be allocated for each GRPC message on the downstream OTLP Collectors' Receiver config, there also is the max_recv_msg_size_mib option.

#1122 (comment)

  • this param did not work for otlp exporter
  • this my conf
exporters:
  debug:
    # verbosity: detailed
    verbosity: normal
  otlp/tempo:
    max_recv_msg_size_mib: 200
    endpoint: tempo:4317
    tls:
      insecure: true
    auth:
      authenticator: headers_setter
  • and this is the output of log
2023/10/31 09:43:11 collector server run finished with error: failed to get config: cannot unmarshal the configuration: 1 error(s) decoding:

* error decoding 'exporters': error reading configuration for "otlp/tempo": 1 error(s) decoding:

* '' has invalid keys: max_recv_msg_size_mib

elysiumHL avatar Oct 31 '23 01:10 elysiumHL

this param did not work for otlp exporter

No it won't. the maximum receive message size is only for the gRPC server side.

On the gRPC client side, the client's max receive message size must be provided in the call options when the client makes a call to the gRPC server.

What is your OTEL collector use case where the exporter receives such large messages from the remote OTLP receiver though? I cannot think of a scenario where this would even be the case.

cwegener avatar Oct 31 '23 02:10 cwegener

did you manage to solves this issue ?

lmnogues avatar May 30 '24 12:05 lmnogues

I think this is the issue which would resolve this eventually.

ptodev avatar Jun 21 '24 11:06 ptodev

Is there a way to dump / debug spans causing that?

Update: I have figured it out configuring the otel collector this way, so it prints both the error message and the all of the Span details it sends

...
config:
  exporters:
    otlp:
      endpoint: "otel-lb-collector:4317"
      tls:
        insecure: true
    debug: {}
    debug/detailed:
      verbosity: detailed
  extensions:
    health_check: {}
  processors:
    resourcedetection:
      detectors: [env, system]
    batch:
      send_batch_size: 1
      send_batch_max_size: 1
  ...
  service:
  ...
    pipelines:
    ...
      traces:
        exporters:
          - debug
          - debug/detailed
          - otlp
    ...

In my case the culprit was python Pymongo instrumentation with enabled capture_statement, so all of the content of a insert statement was captured It was sent to otel-agent through otlp/http fine and then error happens when otel-agent send through otlp/grpc to otel-gw.

smoke avatar Jul 02 '24 16:07 smoke