helm-charts
helm-charts copied to clipboard
Aggregator Not Sending Logs to outputs After Running for a Few Hours
Issue Description:
Problem: After deploying the Fluent Bit Aggregator Helm Chart and running it for a few hours, it stops sending logs to Elasticsearch and Syslog, which are the intended destinations for log forwarding.
Expected Behavior: The Fluent Bit Aggregator should consistently and reliably forward logs to the specified Elasticsearch and Syslog destinations as configured in the Helm Chart.
Steps to Reproduce:
Deploy Fluent Bit Aggregator using the provided Helm Chart. Monitor the log forwarding functionality for a few hours. Observe that log forwarding to Elasticsearch and Syslog ceases after a certain period. Actual Results: After an initial period of successful log forwarding, Fluent Bit Aggregator stops sending logs to Elasticsearch and Syslog without any apparent errors or warnings.
Environment Details:
Kubernetes Cluster Version: 1.26 Fluent Bit Agents Version: 2.1.8 Fluent Bit Aggregator Version: 2.1.9 Elasticsearch Version: 8.9
aggregator config:
[SERVICE]
daemon false
http_Port 2020
http_listen 0.0.0.0
http_server true
log_level debug
parsers_file /fluent-bit/etc/parsers.conf
storage.metrics true
storage.path /fluent-bit/data
[INPUT]
name forward
listen 0.0.0.0
port 24224
[FILTER]
Name rewrite_tag
Match kube.*
Rule $syslog ^(true)$ syslog.* true
Emitter_Name re_emitted
[OUTPUT]
Name syslog
Match syslog.*
Host $HOST
Port 514
Retry_Limit false
Mode tcp
Syslog_Format rfc5424
Syslog_MaxSize 65536
Syslog_Hostname_Key hostname
Syslog_Appname_Key appname
Syslog_Procid_Key procid
Syslog_Msgid_Key msgid
Syslog_SD_Key uls@0
Syslog_Message_Key msg
[OUTPUT]
Name es
Match kube.*
HTTP_User $USER
HTTP_Passwd $PASS
tls Off
tls.verify Off
Host elastic-elasticsearch
Port 9200
Retry_Limit False
Trace_Error On
Trace_Output Off
Suppress_Type_Name On
Replace_Dots On
Buffer_Size False
Logstash_Prefix logstash
Logstash_Format On
Index logstash
Generate_ID On
Write_Operation upsert
[OUTPUT]
Name es
Match host.*
HTTP_User $USER
HTTP_Passwd $PASS
tls Off
tls.verify Off
Host elastic-elasticsearch
Port 9200
Retry_Limit False
Trace_Error On
Trace_Output Off
Suppress_Type_Name On
Replace_Dots On
Buffer_Size False
Logstash_Prefix logstash
Logstash_Format On
Index logstash
Write_Operation upsert
Generate_ID On
fluent-bit agents config:
custom_parsers.conf:
----
[PARSER]
Name docker_no_time
Format json
Time_Keep Off
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
[FILTER]
Name grep
Match *
Exclude log liveness
[FILTER]
Name grep
Match *
Exclude log readiness
[SERVICE]
Daemon Off
Flush 5
Log_Level debug
Parsers_File /fluent-bit/etc/parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Path /var/log/containers/*.log
Exclude_Path /var/log/containers/*_monitoring_*.log
multiline.parser docker, cri
Tag kube.*
Mem_Buf_Limit 50MB
Buffer_Max_Size 1MB
Skip_Long_Lines Off
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name forward
Match *
Host fluent-bit-aggregator
Port 24224
@Rmaabari this repo just hosts Helm charts and the Fluent Bit Aggregator chart is a convenient way to run Fluent Bit as a StatefulSet. Your actual configuration is input into the chart and isn't part of the chart logic.
If you're having trouble with Fluent Bit, have turned on debug logs and think there is an issue your best course of action would be to look at the existing issues and if none match open a new issue at fluent/fluent-bit.
Hi @stevehipwell, thanks for your replay. I am using fluent-bit agents using the original fluent-bit (DaemonSet) helm chart repo, and using your aggregator helm chart in StatefulSet.
in regards to logs, they seem like nothing unsual attaching some of the logs:
[2023/09/17 12:07:58] [debug] [out flush] cb_destroy coro_id=7942
[2023/09/17 12:07:58] [debug] [retry] re-using retry for task_id=1959 attempts=19
[2023/09/17 12:07:58] [ warn] [engine] failed to flush chunk '1-1694939682.183824748.flb', retry in 1069 seconds: task_id=1959, input=forward.0 > output=es.1 (out_id=1)
[2023/09/17 12:07:59] [debug] [output:es:es.1] task_id=1354 assigned to thread #1
[2023/09/17 12:07:59] [debug] [output:es:es.1] task_id=1642 assigned to thread #0
[2023/09/17 12:07:59] [debug] [output:es:es.1] task_id=685 assigned to thread #1
[2023/09/17 12:07:59] [debug] [upstream] KA connection #96 to elastic-elasticsearch:9200 has been assigned (recycled)
[2023/09/17 12:07:59] [debug] [upstream] KA connection #91 to elastic-elasticsearch:9200 has been assigned (recycled)
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [http_client] not using http_proxy for header
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [http_client] not using http_proxy for header
[2023/09/17 12:07:59] [debug] [upstream] KA connection #89 to elastic-elasticsearch:9200 has been assigned (recycled)
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [out_es] converted_size is 0
[2023/09/17 12:07:59] [debug] [http_client] not using http_proxy for header
@Rmaabari the interesting logs would be from when the output to ES failed. But unless it's caused by a defect in the chart you're going to need to open an issue on the Fluent Bit repo to figure out if this is a bug or a configuration issue.
If you provide me the chart values you used and the steps to resolve a failure I can take a look. Also if you lose logs as part of this?
Have you checked the logs on the ES side to see if there is an issue there? If ES is erroring and FB has no persistence a restart fixing the issue would indicate that there is an issue with the configuration and/or log content.
@stevehipwell thanks again for the response! I will gladly supply you with the helm chart values.
values:
service:
type: NodePort
annotations: {}
httpPort: 2020
additionalPorts:
- name: http-forward
port: 24224
containerPort: 24224
protocol: TCP
config:
log_level: debug
http_listen: "0.0.0.0"
pipeline: |-
[INPUT]
name forward
listen 0.0.0.0
port 24224
[FILTER]
Name rewrite_tag
Match kube.*
Rule $syslog ^(true)$ syslog.* false
Emitter_Name re_emitted
[OUTPUT]
Name syslog
Match syslog.*
Host $SYSLOG_SERVER
Port 514
Retry_Limit false
Mode tcp
Syslog_Format rfc5424
Syslog_MaxSize 65536
Syslog_Hostname_Key hostname
Syslog_Appname_Key appname
Syslog_Procid_Key procid
Syslog_Msgid_Key msgid
Syslog_SD_Key uls@0
Syslog_Message_Key msg
[OUTPUT]
Name es
Match kube.*
HTTP_User $USER
HTTP_Passwd $PASS
tls Off
tls.verify Off
Host elastic-elasticsearch
Port 9200
Retry_Limit False
Trace_Error On
Trace_Output Off
Suppress_Type_Name On
Replace_Dots On
Buffer_Size False
Logstash_Prefix logstash
Logstash_Format On
Index logstash
[OUTPUT]
Name es
Match host.*
HTTP_User $USER
HTTP_Passwd $PASS
tls Off
tls.verify Off
Host elastic-elasticsearch
Port 9200
Retry_Limit False
Trace_Error On
Trace_Output Off
Suppress_Type_Name On
Replace_Dots On
Buffer_Size False
Logstash_Prefix logstash
Logstash_Format On
Index logstash
Since the log level is set to debug, I am unable to pinpoint exactly when logs ceased being sent to elastic. I have observed, however, that after a couple of hours without logs being sent to elastic, a very small number of logs are sent for a single minute (around 20 documents), and none of these logs have a K8S filter, and again no logs being sent.
The only thing that resolves this issue is restarting the statefulset, resulting in logs being sent to all expected outputs.
I will also submit an issue to the fluentbit original helm git repository.
here is a screen shot of the logs in kibana view
@Rmaabari how have you configured the persistence?
I'm not sure your ES output configuration is correct, it looks like you're not constraining retries and the buffer?
I'm currently on annual leave so can't get everything on a screen to review how you've got this set up. Please add a link to the FB issue you open in this issue.