opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[exporter/elasticsearch] Use permanent error if applicable

Open carsonip opened this issue 9 months ago • 7 comments

Component(s)

exporter/elasticsearch

Is your feature request related to a problem? Please describe.

There may be places where es exporter should use a permanent error consumererror.NewPermanent() and it does not.

Describe the solution you'd like

Update es exporter code to use permanent errors where applicable

Describe alternatives you've considered

No response

Additional context

No response

carsonip avatar Feb 28 '25 13:02 carsonip

Pinging code owners:

  • exporter/elasticsearch: @JaredTan95 @carsonip @lahsivjar

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Feb 28 '25 13:02 github-actions[bot]

/label good-first-issue

carsonip avatar Mar 27 '25 09:03 carsonip

I would like to attempt to make this enhancement. @carsonip please assign to me when you get a chance.

isaacaflores2 avatar May 15 '25 18:05 isaacaflores2

I don't have the permission to assign issues, but feel free to work on it.

carsonip avatar May 19 '25 10:05 carsonip

Hey @carsonip, I have identified a few places where I think we can return a permanent error to reject the request due to document mapping or routing errors:

  1. The getRequestMappingMode method returns two errors when an invalid mapping is provided via the client metadata. This error is returned to push data for each signal (log example)
  2. The getScopeMappingMode method returns an error when an invalid mapping is returned via the scope attribute. This error also applies to each signal (log example).
  3. ~Lastly for metric data only, the push data method will return an error for the entire request when there is an error finding the index to route to~
    • update: this error is only logged and not propagated to the client

The last two errors are interesting since they are at the scope level. If I understand the flow correctly we can partially process the earlier records, but then reject the entire request based on an invalid elastic.mapping.mode scope attribute. The collector retry logic is not clear to me (see below), but I think returning a permanent error should ensure this request is not retried.

I did some testing running a local collector on the first two cases:

  1. Error: invalid context mapping mode
  • I observed two log entries for the same request (so one retry attempt)
  • When wrapping the error with NewPermanent, I only so one log entry (no retry)
  1. Error invalid scope mapping mode
  • I observed one log entry for the same request (no retry(
  • When wrapping the error with NewPermanent, there was also no retry.

Let me know what you think.

isaacaflores2 avatar Jun 06 '25 23:06 isaacaflores2

one retry attempt

Q: When there are retries, are they retries from within the collector itself in earlier stages of the pipeline, i.e. receiver / processor, or from the client which sends the problematic request?

If it is the latter, how did the response change at the boundary? e.g. otlp / otlphttp response code

carsonip avatar Jun 07 '25 11:06 carsonip

~~Hmm not exactly sure where the retries were happening I will need to get back to you on this one~~

Q: When there are retries, are they retries from within the collector itself in earlier stages of the pipeline, i.e. receiver / processor, or from the client which sends the problematic request?

The client (telemetrygen) retries by default. I confirmed by checking the collector internal metrics when we do not return a permanent error:

rpc_server_request_size_count{rpc_method="Export",rpc_service="opentelemetry.proto.collector.logs.v1.LogsService",rpc_system="grpc",service_instance_id="87f63512-cbb4-4f4c-b87a-fdb8cbb900ce",service_name="otelcontribcol",service_version="0.128.0-dev"} 2

If it is the latter, how did the response change at the boundary? e.g. otlp / otlphttp response code

I do see the oltp response changing though when using the telemetrygen client: telemetrygen logs --otlp-insecure 1 --otlp-header x-elastic-mapping-mode=\"invalid_mode\"

before returning the permanent error

2025-06-06T16:04:13.086-0700    FATAL   logs/worker.go:81       exporter failed {"worker": 0, "error": "context deadline exceeded: rpc error: code = Unavailable desc = invalid context mapping mode: unsupported mapping mode \"invalid_mode\", expected one of [\"bodymap\" \"ecs\" \"none\" \"otel\" \"raw\"]"}
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/telemetrygen/pkg/logs.worker.simulateLogs

after returning the permanent error

2025-06-06T16:07:10.446-0700    FATAL   logs/worker.go:81       exporter failed {"worker": 0, "error": "rpc error: code = Internal desc = Permanent error: invalid context mapping mode: unsupported mapping mode \"invalid_mode\", expected one of [\"bodymap\" \"ecs\" \"none\" \"otel\" \"raw\"]"}
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/telemetrygen/pkg/logs.worker.simulateLogs

isaacaflores2 avatar Jun 13 '25 22:06 isaacaflores2

Thanks for the example. Lgtm.

carsonip avatar Jun 16 '25 22:06 carsonip