helm-charts
helm-charts copied to clipboard
[fluent-bit] Liveness/Readiness probe fails when running on AWS EKS 1.20 with Bottlerocket OS
Cross reference to https://github.com/fluent/fluent-bit/issues/3521
Pods deployed by this chart fail liveness probe when running on AWS EKS 1.20 with Bottlerocket OS. As far I was able to understand, fluent-bit's http_server stops listening almost immediately after pod start, which leads to failed readiness/liveness probes. This happens only on nodes with Bottlerocket OS, regular nodes with Amazon Linux 2 run fluent-bit pods just fine.
To reproduce:
- Run AWS EKS 1.20 with Bottlerocket nodes https://docs.aws.amazon.com/eks/latest/userguide/launch-node-bottlerocket.html
- Deploy fluent-bit helm chart with default values, but dummy input and null output and empty filters
- Observe how fluent-bit pods go into CrashLoopBackOff
As a workaround, you should be able to disable the probes until https://github.com/fluent/fluent-bit/issues/3521 is resolved.
https://github.com/bottlerocket-os/bottlerocket/issues/1628 is tracking a workaround on the Bottlerocket side. Downgrading to Bottlerocket 1.19 is also supposed to fix the issue.
Fixed with https://github.com/bottlerocket-os/bottlerocket/releases/tag/v1.1.3
But this is kinda workaround, by making kubelet's cpuManagerPolicy: none
as default. With cpuManagerPolicy: static
, the issue still persists.
How do I disable the probes? "enabled: false" isn't an accepted value in the chart
Fixed with https://github.com/bottlerocket-os/bottlerocket/releases/tag/v1.1.3
But this is kinda workaround, by making kubelet's
cpuManagerPolicy: none
as default. WithcpuManagerPolicy: static
, the issue still persists.
Agree. verified this issue persists in fluentbit 1.1 and k8s 1.22, with cpuManagerPolicy set to "static". this is just a temporary solution, and not a permanent fix for the underlying issue.
I got the same issue (not in all nodes, but some of them) running EKS 1.24 with Bootlerocket AMI running Fluentbit 2.0.1