fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

[error] [opentelemetry] snappy decompression failed issue

Open 3siksfather opened this issue 1 year ago • 2 comments

Bug Report

Describe the bug Actual Behavior: Fluent Bit throws a snappy decompression failed error. Metrics are missing in Grafana, likely due to issues with decompressing the data.

To Reproduce fluent-bit pod random time fail log [error] [opentelemetry] snappy decompression failed Screenshots image any all query

Your Environment prometheus-server latest fluent-bit latest opentelemetry-collector latest

configmap [SERVICE] Flush 5 Daemon Off Log_Level debug Config_Watch On HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On

[INPUT] name prometheus_remote_write listen 0.0.0.0 port 8080

[OUTPUT] name stdout match *

[OUTPUT] Name opentelemetry Match * Host opentelemetry-collector.default.svc.cluster.local Port 4318

Additional context Request: Could you please help investigate the root cause of the snappy decompression failed error? Additionally, if there are any configuration changes or updates to the OpenTelemetry exporter or Fluent Bit that might resolve this issue, that would be helpful.

3siksfather avatar Dec 17 '24 08:12 3siksfather

pls share your full config, if you there is a way to get the payload that is generating the issue would be very helpful

edsiper avatar Dec 17 '24 16:12 edsiper

@edsiper Is there a file or specific config file you want? All configurations are the default configurations of the latest version of helm chart. The only modified contents are the settings required for integration.


configmap

prometheus remote_write - url: "http://fluent-bit-metric.default.svc.cluster.local:8080" write_relabel_configs: - action: drop regex: (~~~) source_labels: [name] - action: drop regex: kubernetes-apiservers source_labels: [job] queue_config: capacity: 4000 # default 2500 max_shards: 50 # default = 200 min_shards: 10 # default = 1 max_samples_per_send: 2000 # default = 500 batch_send_deadline: 15s # default = 5s min_backoff: 30ms # default = 30ms max_backoff: 100ms # default = 100ms metadata_config: send_interval: 10s # default = 1m


fluent-bit same configmap

opentelemetry processors: batch: {} memory_limiter: check_interval: 5s limit_percentage: 80 spike_limit_percentage: 25 receivers: jaeger: protocols: grpc: endpoint: ${env:MY_POD_IP}:14250 thrift_compact: endpoint: ${env:MY_POD_IP}:6831 thrift_http: endpoint: ${env:MY_POD_IP}:14268 otlp: protocols: grpc: endpoint: ${env:MY_POD_IP}:4317 http: endpoint: ${env:MY_POD_IP}:4318 prometheus: config: scrape_configs: - job_name: opentelemetry-collector scrape_interval: 10s static_configs: - targets: - ${env:MY_POD_IP}:8888 zipkin: endpoint: ${env:MY_POD_IP}:9411 service: extensions:

  • health_check pipelines: logs: exporters:
    • debug processors:
    • memory_limiter
    • batch receivers:
    • otlp metrics: exporters:
    • debug processors:
    • memory_limiter
    • batch receivers:
    • otlp
    • prometheus traces: exporters:
    • debug processors:
    • memory_limiter
    • batch receivers:
    • otlp
    • jaeger
    • zipkin telemetry: metrics: address: ${env:MY_POD_IP}:8888

3siksfather avatar Dec 18 '24 04:12 3siksfather

This symptom also appears in the latest version.

Image

3siksfather avatar Jan 21 '25 05:01 3siksfather

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Apr 22 '25 02:04 github-actions[bot]

That's a typo, the plugin that's failing is the prometheus_remote_write input plugin, could you explain to me what the data source is so I can try to set it up locally?

leonardo-albertovich avatar Apr 22 '25 11:04 leonardo-albertovich

@leonardo-albertovich For operational reasons, we are using Fluent Bit with the Prometheus remote write input plugin. The data sources being scraped are fairly standard: kube-state-metrics, node-exporter, and blackbox-exporter.

This issue does not occur under normal setups, but it only appears when using Fluent Bit. To verify this behavior, we conducted a test using the same local configuration with Fluent Bit, and we were able to reproduce the issue.

test scenario prometheus (remote write) -> fluentbit (input : Prometheus remotewrite, output : prometheus) -> new prometheus <- Grafana (DataSource)

3siksfather avatar Jul 07 '25 08:07 3siksfather

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Oct 24 '25 02:10 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Nov 17 '25 02:11 github-actions[bot]