fluent-bit [error] [opentelemetry] snappy decompression failed issue

Bug Report

Describe the bug Actual Behavior: Fluent Bit throws a snappy decompression failed error. Metrics are missing in Grafana, likely due to issues with decompressing the data.

To Reproduce fluent-bit pod random time fail log [error] [opentelemetry] snappy decompression failed Screenshots any all query

Your Environment prometheus-server latest fluent-bit latest opentelemetry-collector latest

configmap [SERVICE] Flush 5 Daemon Off Log_Level debug Config_Watch On HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On

[INPUT] name prometheus_remote_write listen 0.0.0.0 port 8080

[OUTPUT] name stdout match *

[OUTPUT] Name opentelemetry Match * Host opentelemetry-collector.default.svc.cluster.local Port 4318

Additional context Request: Could you please help investigate the root cause of the snappy decompression failed error? Additionally, if there are any configuration changes or updates to the OpenTelemetry exporter or Fluent Bit that might resolve this issue, that would be helpful.

Dec 17 '24 08:12 3siksfather

pls share your full config, if you there is a way to get the payload that is generating the issue would be very helpful

Dec 17 '24 16:12 edsiper

@edsiper Is there a file or specific config file you want? All configurations are the default configurations of the latest version of helm chart. The only modified contents are the settings required for integration.

configmap

prometheus remote_write - url: "http://fluent-bit-metric.default.svc.cluster.local:8080" write_relabel_configs: - action: drop regex: (~~~) source_labels: [name] - action: drop regex: kubernetes-apiservers source_labels: [job] queue_config: capacity: 4000 # default 2500 max_shards: 50 # default = 200 min_shards: 10 # default = 1 max_samples_per_send: 2000 # default = 500 batch_send_deadline: 15s # default = 5s min_backoff: 30ms # default = 30ms max_backoff: 100ms # default = 100ms metadata_config: send_interval: 10s # default = 1m

fluent-bit same configmap

opentelemetry processors: batch: {} memory_limiter: check_interval: 5s limit_percentage: 80 spike_limit_percentage: 25 receivers: jaeger: protocols: grpc: endpoint: ${env:MY_POD_IP}:14250 thrift_compact: endpoint: ${env:MY_POD_IP}:6831 thrift_http: endpoint: ${env:MY_POD_IP}:14268 otlp: protocols: grpc: endpoint: ${env:MY_POD_IP}:4317 http: endpoint: ${env:MY_POD_IP}:4318 prometheus: config: scrape_configs: - job_name: opentelemetry-collector scrape_interval: 10s static_configs: - targets: - ${env:MY_POD_IP}:8888 zipkin: endpoint: ${env:MY_POD_IP}:9411 service: extensions:

health_check pipelines: logs: exporters:
- debug processors:
- memory_limiter
- batch receivers:
- otlp metrics: exporters:
- debug processors:
- memory_limiter
- batch receivers:
- otlp
- prometheus traces: exporters:
- debug processors:
- memory_limiter
- batch receivers:
- otlp
- jaeger
- zipkin telemetry: metrics: address: ${env:MY_POD_IP}:8888

Dec 18 '24 04:12 3siksfather

This symptom also appears in the latest version.

Jan 21 '25 05:01 3siksfather

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Apr 22 '25 02:04 github-actions[bot]

That's a typo, the plugin that's failing is the prometheus_remote_write input plugin, could you explain to me what the data source is so I can try to set it up locally?

Apr 22 '25 11:04 leonardo-albertovich

@leonardo-albertovich For operational reasons, we are using Fluent Bit with the Prometheus remote write input plugin. The data sources being scraped are fairly standard: kube-state-metrics, node-exporter, and blackbox-exporter.

This issue does not occur under normal setups, but it only appears when using Fluent Bit. To verify this behavior, we conducted a test using the same local configuration with Fluent Bit, and we were able to reproduce the issue.

test scenario prometheus (remote write) -> fluentbit (input : Prometheus remotewrite, output : prometheus) -> new prometheus <- Grafana (DataSource)

Jul 07 '25 08:07 3siksfather

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Oct 24 '25 02:10 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Nov 17 '25 02:11 github-actions[bot]