caught signal (SIGSEGV) due to many time out
Bug Report
Describe the bug When there's an issue with DNS causing time out for Fluent-bit to send the logs to the target (in our case Splunk and CloudWatch ), after around 30min of the issue, 4 out of 6 fluent-bit containers (yes container not the pod) in k8s crash. The pods remain up but with no containers running.
To Reproduce
- Have Fluent-bit up and running in kubernets installed via Terraform with
kubernetes_daemonset
Steps to reproduce the problem:
-
8 fluent-bit pods are up and running
-
DNS issue start and Fluent-bit starts getting the following errors
[ warn] [engine] failed to flush chunk '1-1756969325.329701800.flb', retry in 8 seconds: task_id=0, input=tail.0 > output=splunk.2 (out_id=2)[error] [upstream] connection #295 to http-inputs-<hiddenpart>.splunkcloud.com:443 timed out after 10 seconds[ warn] [engine] chunk '1-1756969305.177504284.flb' cannot be retried: task_id=2, input=tail.0 > output=splunk.2 -
After around 30 minutes, 4 pods crash, and these are the last 2 lines in this order
[engine] caught signal (SIGSEGV)
[ warn] [net] getaddrinfo(host='logs.us-west-2.amazonaws.com', err=12): Timeout while contacting DNS servers
Expected behavior Fluent-bit should continue running and retrying to send the logs to the target
Your Environment
- Version used: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable (couldn't find in the log the specific version)
- Configuration:
- Environment name and version: Amazon Elastic Kubernetes Service - eks.16
- Server type and version: Amazon Linux 2 - Kernel version 5.10.236-228.935.amzn2.x86_64
Additional context We lose all logs during the DNS blip. After the DNS issue is solved, fluent-bit is not able to send the logs of the application that were generated during the DNS blip. We saw a few there were able to send logs after the blip, but some are totally lost.
Fluent Bit version ?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.