fluent-bit
fluent-bit copied to clipboard
Gzip Decompression Failure Due to 100MB Limit in Fluent Bit 3.0.7
Bug Report
I'm encountering an issue with Fluent Bit where the gzip decompression fails due to exceeding the maximum decompression size of 100MB. Below are the relevant error logs and configurations for both the collector and aggregator.
To Reproduce
Example log message
[2024/07/08 08:05:26] [error] [gzip] maximum decompression size is 100MB
[2024/07/08 08:05:26] [error] [input:forward:forward.0] gzip uncompress failure
[2024/07/08 08:05:52] [error] [gzip] maximum decompression size is 100MB
[2024/07/08 08:05:52] [error] [input:forward:forward.0] gzip uncompress failure
[2024/07/08 08:06:08] [error] [gzip] maximum decompression size is 100MB
[2024/07/08 08:06:08] [error] [input:forward:forward.0] gzip uncompress failure
[2024/07/08 08:06:20] [error] [gzip] maximum decompression size is 100MB
[2024/07/08 08:06:20] [error] [input:forward:forward.0] gzip uncompress failure
Steps to reproduce the problem
Set up Fluent Bit with the provided collector and aggregator configurations.
Monitor the logs for gzip decompression errors.
Expected behavior
Fluent Bit should handle the gzip decompression without exceeding the maximum decompression size limit.
Screenshots
N/A
Your Environment
Version used: Fluent Bit 3.0.7
Configuration:
Collector Configuration:
[SERVICE]
daemon false
log_level warn
storage.path /var/fluent-bit/state/flb-storage/
storage.sync normal
storage.max_chunks_up 32
storage.backlog.mem_limit 32MB
storage.metrics true
storage.delete_irrecoverable_chunks true
http_server true
http_listen 0.0.0.0
http_Port 2020
[INPUT]
name tail
path /var/log/containers/*.log
tag_regex (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
tag kube.<namespace_name>.<pod_name>.<container_name>
read_from_head true
multiline.parser cri
skip_long_lines true
skip_empty_lines true
buffer_chunk_size 32KB
buffer_max_size 32KB
db /var/fluent-bit/state/flb-storage/tail-containers.db
db.sync normal
db.locking true
db.journal_mode wal
storage.type filesystem
[OUTPUT]
name forward
match *
host fluent-bit-aggregator.observability.svc.cluster.local
port 24224
compress gzip
workers 2
retry_limit false
storage.total_limit_size 16GB
Aggregator Configuration:
[SERVICE]
daemon false
log_level warn
storage.path /fluent-bit/data
storage.sync full
storage.backlog.mem_limit 128M
storage.metrics true
storage.delete_irrecoverable_chunks true
storage.max_chunks_up 64
http_server true
http_listen 0.0.0.0
http_Port 2020
[INPUT]
name forward
listen 0.0.0.0
port 24224
buffer_chunk_size 1M
buffer_max_size 4M
storage.type filesystem
[OUTPUT]
name loki
match *
host loki-gateway.logging.svc.cluster.local
port 80
line_format json
auto_kubernetes_labels false
label_keys $cluster, $namespace, $app
storage.total_limit_size 16GB
Environment name and version (e.g. Kubernetes? What version?)
Kubernetes 1.30, 1.29, 1.28
Server type and version
AKS/EKS
Operating System and version
Ubuntu, AL2, AL2023 and BottlerocketOS
Filters and plugins
See above
Additional context
This issue persists across all Fluent Bit instances with the same configuration. Both collector and aggregator are using the same Fluent Bit version (3.0.7). The rate of records processed per second is consistently around 800 (so not too much). Any guidance or solution to resolve this issue would be greatly appreciated.