fluentd-kubernetes-daemonset
fluentd-kubernetes-daemonset copied to clipboard
`free(): invalid pointer` with latest fluent/fluentd-kubernetes-daemonset:v1-debian-forward-arm64 image
Describe the bug
Using the latest v1-debian-forward-arm64
image results in the container throwing free(): invalid pointer
and constantly restarting leading to a node eviction
To Reproduce
I have provided a redacted config to reproduce
Expected behavior
Worker should comeup and stay up
Your Environment
- Tag of using fluentd-kubernetes-daemonset:v1-debian-forward-arm64
Your Configuration
@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
@include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"
@include conf.d/*.
<label @FLUENT_LOG>
<match fluent.**>
@type null
@id ignore_fluent_logs
</match>
</label>
<match kubelet>
@type null
</match>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
kubernetes_url "#{ENV['FLUENT_FILTER_KUBERNETES_URL'] || 'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
verify_ssl "#{ENV['KUBERNETES_VERIFY_SSL'] || true}"
ca_file "#{ENV['KUBERNETES_CA_FILE']}"
skip_labels "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_LABELS'] || 'false'}"
skip_container_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_CONTAINER_METADATA'] || 'false'}"
skip_master_url "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_MASTER_URL'] || 'false'}"
skip_namespace_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_NAMESPACE_METADATA'] || 'false'}"
watch "#{ENV['FLUENT_KUBERNETES_WATCH'] || 'true'}"
</filter>
<source>
@type tail
@id in_tail_container_logs
path "#{ENV['FLUENT_CONTAINER_TAIL_PATH'] || '/var/log/containers/*.log'}"
pos_file "#{File.join('/var/log/', ENV.fetch('FLUENT_POS_EXTRA_DIR', ''), 'fluentd-containers.log.pos')}"
tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
read_from_head true
<parse>
@type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}"
</parse>
</source>
<filter qfunctions.**>
@type record_transformer
enable_ruby true
<record>
message ${record["message"].gsub(/^.*std(out|err):\s/, '')}
</record>
</filter>
<filter qfunctions.**>
@type parser
format json
key_name message
emit_invalid_record_to_error false
</filter>
<match qfunctions.**>
@type rewrite_tag_filter
<rule>
key tenant_id
pattern /^abc1234$/
tag abc1234
</rule>
<rule>
key tenant_id
pattern /.+/
tag clear
</rule>
</match>
<match abc1234.**>
@type http
@id out_abc1234
@log_level info
endpoint "#{ENV['ENDPOINT']}"
http_method post
content_type application/json
json_array true
<format>
@type json
</format>
headers {"X-P-Stream": "functions", "X-P-Meta-Org-Id": "abc1234"}
<auth>
method basic
username "#{ENV['USERNAME']}"
password "#{ENV['PASSWORD']}"
</auth>
</match>
<match clear>
@type null
</match>
### Your Error Log
```shell
2024-01-17 15:48:14 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-01-17 15:48:15 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-01-17 15:48:15 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-01-17 15:48:15 +0000 [info]: adding match pattern="kubelet" type="null"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="qfunctions.**" type="record_transformer"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="qfunctions.**" type="parser"
2024-01-17 15:48:15 +0000 [info]: adding match pattern="qfunctions.**" type="rewrite_tag_filter"
2024-01-17 15:48:15 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff7b7b91b8 @keys="tenant_id">, /^abc1234$/, "", "abc1234", nil]
2024-01-17 15:48:15 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff7b7b8790 @keys="tenant_id">, /.+/, "", "clear", nil]
2024-01-17 15:48:15 +0000 [info]: adding match pattern="abc1234.**" type="http"
2024-01-17 15:48:15 +0000 [warn]: #0 [out_abc1234] Status code 503 is going to be removed from default `retryable_response_codes` from fluentd v2. Please add it by yourself if you wish
2024-01-17 15:48:15 +0000 [info]: adding match pattern="clear" type="null"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="prometheus"
2024-01-17 15:48:15 +0000 [info]: adding source type="prometheus_output_monitor"
2024-01-17 15:48:15 +0000 [info]: adding source type="tail"
2024-01-17 15:48:15 +0000 [info]: #0 starting fluentd worker pid=361 ppid=6 worker=0
2024-01-17 15:48:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/contact-task-runtime-5cbd49696c-fmqkz_openfaas-fn_contact-task-runtime-90840620b3e6f1d26b85a666402b31aa3a5d5f9faf8f2388c919c87c5ce082a1.log
2024-01-17 15:48:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ground-task-runtime-65446d7bcc-527dl_openfaas-fn_ground-task-runtime-b12db1d88da3a582965a7ff372367d9676e9e640f505694022c6f5da97649e46.log
2024-01-17 15:48:15 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer
2024-01-17 15:48:17 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-01-17 15:48:18 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-01-17 15:48:18 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-01-17 15:48:18 +0000 [info]: adding match pattern="kubelet" type="null"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="qfunctions.**" type="record_transformer"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="qfunctions.**" type="parser"
2024-01-17 15:48:18 +0000 [info]: adding match pattern="qfunctions.**" type="rewrite_tag_filter"
2024-01-17 15:48:18 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff8cd245b0 @keys="tenant_id">, /^org_2Jf4UxF6FEwCMecX$/, "", "abc1234", nil]
2024-01-17 15:48:18 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff8cd23f98 @keys="tenant_id">, /.+/, "", "clear", nil]
2024-01-17 15:48:18 +0000 [info]: adding match pattern="abc1234.**" type="http"
2024-01-17 15:48:18 +0000 [warn]: #0 [out_abc1234] Status code 503 is going to be removed from default `retryable_response_codes` from fluentd v2. Please add it by yourself if you wish
2024-01-17 15:48:18 +0000 [info]: adding match pattern="clear" type="null"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="prometheus"
2024-01-17 15:48:18 +0000 [info]: adding source type="prometheus_output_monitor"
2024-01-17 15:48:18 +0000 [info]: adding source type="tail"
2024-01-17 15:48:18 +0000 [info]: #0 starting fluentd worker pid=376 ppid=6 worker=0
2024-01-17 15:48:18 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/contact-task-runtime-5cbd49696c-fmqkz_openfaas-fn_contact-task-runtime-90840620b3e6f1d26b85a666402b31aa3a5d5f9faf8f2388c919c87c5ce082a1.log
2024-01-17 15:48:18 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ground-task-runtime-65446d7bcc-527dl_openfaas-fn_ground-task-runtime-b12db1d88da3a582965a7ff372367d9676e9e640f505694022c6f5da97649e46.log
2024-01-17 15:48:18 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer
Additional context
we have a daemonset in a cluster running from about 22d ago where we are not seeing the invalid pointer issue
the sha 256 digest we are having issue with: 59886dc179d52a43dfdf061c764e9856dafc67c41dd78e9d868872000d9e660a
reverting to this sha: f0c0d41aba562c5f4ce13f2b00ae50c381925063cfcc7ec7a9f2a4f622ee9535
doesn't throw invalid pointer
I have the same issue in fluent/fluentd-kubernetes-daemonset:v1-debian-cloudwatch. I revert to this sha: b7185b3483d2ca5c3e923e33641dd3814865321b34da05c46eda96576da905a0 doesn't throw this error too. v1-debian-cloudwatch.log
Also seeing this in fluent/fluentd-kubernetes-daemonset:v1.16.5-debian-forward-1.0 image
logging fails
2024-04-03 20:27:34 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/node-problem-detector-kwwk8_kube-system_node-problem-detector-4e2796e4c3ca14953fda355aca52c0200a0f53b7b0596d7e94ec89169c782f8a.log
2024-04-03 20:27:34 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/unbound-exporter-llm48_unbound_unbound-exporter-bd636614623be73dc03069f9a0fefffb779c47d2c034e796d3364fb49fb2e6fe.log
2024-04-03 20:27:34 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/unbound-exporter-llm48_unbound_unbound-exporter-init-1b88c92fa871c07c66d558a84a656879a1b13dfa12c6b533b37ec9ae74fc555f.log
2024-04-03 20:27:34 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer
2024-04-03 20:27:37 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-04-03 20:27:37 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days
NOTE:
-
https://github.com/fluent/fluentd-kubernetes-daemonset/issues/1478#issuecomment-1896114623
- mentioned v1-debian-forward-arm64 commit (59886dc179d52a43dfdf061c764e9856dafc67c41dd78e9d868872000d9e660a) which is marked problematic by reporter.
- v1.16.3-debian-forward-arm64-2.0
-
https://github.com/fluent/fluentd-kubernetes-daemonset/issues/1478#issuecomment-1896122901
- mentioned commit v1-debian-forward-arm64 (f0c0d41aba562c5f4ce13f2b00ae50c381925063cfcc7ec7a9f2a4f622ee9535) which is marked no problem by reporter.
- v1.16-debian-forward-arm64-1
-
https://github.com/fluent/fluentd-kubernetes-daemonset/issues/1478#issuecomment-1903352220
- mentioned commit v1-debian-cloudwatch-amd64 (b7185b3483d2ca5c3e923e33641dd3814865321b34da05c46eda96576da905a0) which is marked no problem by reporter.
- v1.16.3-debian-cloudwatch-1.0 (systemd-journal 1.4.2)
-
https://github.com/fluent/fluentd-kubernetes-daemonset/issues/1478#issuecomment-2037129830
- marked problematic
- mentioned 1.16.5-debian-forward-1.0 is not sure whether amd64 or arm64 specific issue.
v1.17-debian-forward-1.3 or v1.16.5-debian-forward-1.3 will fix this issue.