fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

caught signal (SIGSEGV) due to many time out

Open thiagobdp opened this issue 3 months ago • 2 comments

Bug Report

Describe the bug When there's an issue with DNS causing time out for Fluent-bit to send the logs to the target (in our case Splunk and CloudWatch ), after around 30min of the issue, 4 out of 6 fluent-bit containers (yes container not the pod) in k8s crash. The pods remain up but with no containers running.

To Reproduce

  • Have Fluent-bit up and running in kubernets installed via Terraform with kubernetes_daemonset

Steps to reproduce the problem:

  • 8 fluent-bit pods are up and running

  • DNS issue start and Fluent-bit starts getting the following errors [ warn] [engine] failed to flush chunk '1-1756969325.329701800.flb', retry in 8 seconds: task_id=0, input=tail.0 > output=splunk.2 (out_id=2) [error] [upstream] connection #295 to http-inputs-<hiddenpart>.splunkcloud.com:443 timed out after 10 seconds [ warn] [engine] chunk '1-1756969305.177504284.flb' cannot be retried: task_id=2, input=tail.0 > output=splunk.2

  • After around 30 minutes, 4 pods crash, and these are the last 2 lines in this order

[engine] caught signal (SIGSEGV)
[ warn] [net] getaddrinfo(host='logs.us-west-2.amazonaws.com', err=12): Timeout while contacting DNS servers

Expected behavior Fluent-bit should continue running and retrying to send the logs to the target

Your Environment

  • Version used: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable (couldn't find in the log the specific version)
  • Configuration:
  • Environment name and version: Amazon Elastic Kubernetes Service - eks.16
  • Server type and version: Amazon Linux 2 - Kernel version 5.10.236-228.935.amzn2.x86_64

Additional context We lose all logs during the DNS blip. After the DNS issue is solved, fluent-bit is not able to send the logs of the application that were generated during the DNS blip. We saw a few there were able to send logs after the blip, but some are totally lost.

thiagobdp avatar Sep 04 '25 16:09 thiagobdp

Fluent Bit version ?

edsiper avatar Sep 04 '25 17:09 edsiper

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Dec 08 '25 02:12 github-actions[bot]