fluentd-kubernetes-daemonset
fluentd-kubernetes-daemonset copied to clipboard
Fluentd encountered a memory error (free(): invalid pointer)
Describe the bug
hi, I have fluentd on EKS after updating the cluster from 1.24 to 1.25, i got an issue with fluentd pod. every pod on my 2 clusters (staging and prod) has an error (free(): invalid pointer) and fills up my pod space with a core. files I assume it's core dumps error files that cause my node to DiskPressure and evict my fluentd pods and reload them. It happened to me in every node in the clusters.
To Reproduce
Expected behavior
Your Environment
- Tag of using fluentd-kubernetes-daemonset:
docker.io/fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
docker.io/fluent/fluentd-kubernetes-daemonset@sha256:8e330e587cc0a918fca59a1e07cfd80340d8002032cd62e359b3e30a9da4b2ee
Your Configuration
# AUTOMATICALLY GENERATED
# DO NOT EDIT THIS FILE DIRECTLY, USE /templates/conf/fluent.conf.erb
@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
@include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"
@include kubernetes.conf
@include conf.d/*.conf
<filter kubernetes.var.log.containers.**----**.log>
@type parser
key_name log
reserve_data true
reserve_time true
hash_value_field -------------fields
remove_key_name_field false
replace_invalid_sequence false
emit_invalid_record_to_error true
<parse>
@type json
</parse>
</filter>
<filter kubernetes.var.log.containers.**----**.log kubernetes.var.log.containers.----**.log kubernetes.var.log.containers.-----**.log kubernetes.var.log.containers.-----**.log>
@type parser
key_name log
reserve_data true
reserve_time true
hash_value_field fields
remove_key_name_field false
replace_invalid_sequence false
emit_invalid_record_to_error true
<parse>
@type json
</parse>
</filter>
<label @ERROR>
<match **>
@type null
</match>
</label>
<match kubernetes.var.log.containers.*----**.log kubernetes.var.log.containers.----**.log kubernetes.var.log.containers.-----**.log kubernetes.var.log.containers.-----**.log>
@type elasticsearch
@id out_es
@log_level debug
include_tag_key true
hosts "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}"
scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"
ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}"
ssl_version "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERSION'] || 'TLSv1_2'}"
user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}"
reload_connections "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS'] || 'false'}"
reconnect_on_error "#{ENV['FLUENT_ELASTICSEARCH_RECONNECT_ON_ERROR'] || 'true'}"
reload_on_failure "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_ON_FAILURE'] || 'true'}"
log_es_400_reason "#{ENV['FLUENT_ELASTICSEARCH_LOG_ES_400_REASON'] || 'false'}"
logstash_prefix "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_PREFIX'] || 'logstash'}"
logstash_dateformat "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}"
logstash_format "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_FORMAT'] || 'true'}"
index_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_INDEX_NAME'] || 'logstash'}"
target_index_key "#{ENV['FLUENT_ELASTICSEARCH_TARGET_INDEX_KEY'] || use_nil}"
type_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_TYPE_NAME'] || 'fluentd'}"
include_timestamp "#{ENV['FLUENT_ELASTICSEARCH_INCLUDE_TIMESTAMP'] || 'false'}"
template_name "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_NAME'] || use_nil}"
template_file "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_FILE'] || use_nil}"
template_overwrite "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_OVERWRITE'] || use_default}"
sniffer_class_name "#{ENV['FLUENT_SNIFFER_CLASS_NAME'] || 'Fluent::Plugin::ElasticsearchSimpleSniffer'}"
request_timeout "#{ENV['FLUENT_ELASTICSEARCH_REQUEST_TIMEOUT'] || '5s'}"
suppress_type_name "#{ENV['FLUENT_ELASTICSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}"
enable_ilm "#{ENV['FLUENT_ELASTICSEARCH_ENABLE_ILM'] || 'false'}"
ilm_policy_id "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_ID'] || use_default}"
ilm_policy "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY'] || use_default}"
ilm_policy_overwrite "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_OVERWRITE'] || 'false'}"
<buffer>
flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}"
flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '1s'}"
chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '6M'}"
queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}"
retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}"
retry_forever true
</buffer>
</match>
Your Error Log
2024-02-27 22:41:28 +0000 [warn]: #0 no patterns matched tag="kubernetes.var.log.containers.fluentd-mmslx_kube-system_fluentd-cb6cda03e3dddeffd0f5ae8932bc6b60991716e544dd48549f8a7aaa1657ad1d.log"
free(): invalid pointer
2024-02-27 22:41:28 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-02-27 22:41:29 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-02-27 22:41:29 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-02-27 22:41:29 +0000 [info]: adding match in @ERROR pattern="**" type="null"
2024-02-27 22:41:29 +0000 [info]: adding filter pattern="kubernetes.var.log.containers.**----**.log kubernetes.var.log.containers.----**.log kubernetes.var.log.containers.-----**.log kubernetes.var.log.containers.mono-**.log" type="kubernetes_metadata"
2024-02-27 22:41:30 +0000 [info]: adding filter pattern="kubernetes.var.log.containers.**----**.log" type="parser"
2024-02-27 22:41:30 +0000 [info]: adding filter pattern="kubernetes.var.log.containers.**----**.log kubernetes.var.log.containers.customer-insights**.log kubernetes.var.log.containers.-----**.log kubernetes.var.log.containers.-----**.log" type="parser"
2024-02-27 22:41:30 +0000 [info]: adding match pattern="kubernetes.var.log.containers.**----**.log kubernetes.var.log.containers.----**.log kubernetes.var.log.containers.-----**.log kubernetes.var.log.containers.-----**.log" type="elasticsearch"
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'host localhost' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host: localhost' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'index_name logstash' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'index_name: logstash' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'template_name ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'template_name: ' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'logstash_prefix k8s_test' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_prefix: k8s_test' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' has timestamp placeholders, but chunk key 'time' is not configured
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'deflector_alias ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'deflector_alias: ' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'application_name default' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'application_name: default' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'ilm_policy_id logstash-policy' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'ilm_policy_id: logstash-policy' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] Need substitution: false
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] 'host_placeholder localhost' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host_placeholder: localhost' doesn't have tag placeholder
2024-02-27 22:41:30 +0000 [info]: adding source type="systemd"
2024-02-27 22:41:30 +0000 [info]: adding source type="systemd"
2024-02-27 22:41:30 +0000 [info]: adding source type="systemd"
2024-02-27 22:41:30 +0000 [info]: adding source type="prometheus"
2024-02-27 22:41:30 +0000 [info]: adding source type="prometheus_output_monitor"
2024-02-27 22:41:30 +0000 [info]: adding source type="tail"
2024-02-27 22:41:30 +0000 [info]: #0 starting fluentd worker pid=5673 ppid=7 worker=0
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] buffer started instance=2380 stage_size=0 queue_size=0
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/aws-node-hn6rp_kube-system_aws-node-fdf6554cbb884b7b86afb744c78d69c70a1d3dc821e1564ff27dfd9b9.log
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] flush_thread actually running
2024-02-27 22:41:30 +0000 [debug]: #0 [out_es] enqueue_thread actually running
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/aws-node-hn6rp_kube-system_aws-vpc-cni-init-1f72b17e3b976fc6a1d4a602766f86d66a099d2c513326b5863eed200597c190.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/coredns-867c6c88f7-sz86h_kube-system_coredns-2c6932efa35faabbfa4febf1f757de07e227c77d445bceb4f12d685535297f94.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/-----86d8cb7cc8-h5bhg_production_----112b19f1842c6c6bf0f24600d409bd04a817a5bc46128b5bab4040bb080b0e03.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/datadog-nh5rp_default_agent-806602e5eaa936d1eafac6d226ac5b862187a6681f5dee002992b5e02f172d97.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/datadog-nh5rp_default_init-config-1991f45a20d8658c04f0bcf888810d9753340f663d6ac20176ad054c258245c7.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/datadog-nh5rp_default_init-volume-a3a91a21c99f73b996cfe429f51f7753dd4d86256fe63793f871ef0e11bda0a6.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/datadog-nh5rp_default_process-agent-5c39e03c20b2f2c89f4e7cdc80ebd1a1e182fa3e91b4e567f73ec066d5acb6c5.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/datadog-nh5rp_default_trace-agent-b38f19a4c516387d5b963ac41be2c0021eebd521ff373e25906556a12593d241.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ebs-csi-node-j2xpl_kube-system_ebs-plugin-a60ffdc9990e7e3b8f76d5630b4bc67431ca6bf3167fe21a67da9a6ea0c4642a.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ebs-csi-node-j2xpl_kube-system_liveness-probe-397b00b61201e9a6c657c1d0fcb86f24c328bdf05f216e403876563dce2c7cd6.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ebs-csi-node-j2xpl_kube-system_node-driver-registrar-4a995c4d4618928a22df0d780dbdd4789b567b07762b02a5a9b1f0ef9ed605d0.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/fluentd-mmslx_kube-system_fluentd-cb6cda03e3dddeffd0f5ae8932bc6b60991716e544dd48549f8a7aaa1657ad1d.log
2024-02-27 22:41:30 +0000 [warn]: #0 no patterns matched tag="kubernetes.var.log.containers.fluentd-mmslx_kube-system_fluentd-cb6cda03e3dddeffd0f5ae8932bc6b60991716e544dd48549f8a7aaa1657ad1d.log"
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ingress-nginx-controller-68bf6f97f8-fmzvc_ingress-nginx_controller-e3313fe55f5d00f0ff1f6cb42063766eb1ec7de182d0396c642f2c9f0d900a63.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/kube-proxy-gtlbz_kube-system_kube-proxy-f5569c3d4daebd12ddb1a1d19dc40a464bfe5e9679526a7955c4980e2b8867db.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/metabase-8794dbff-6fdss_metabase_metabase-22045c511479498f0047b2834ef80f131263b7fcf76c56465b8f0ba776db7d1d.log
2024-02-27 22:41:30 +0000 [warn]: #0 no patterns matched tag="kubernetes.var.log.containers.metabase-8794dbff-6fdss_metabase_metabase-22045c511479498f0047b2834ef80f131263b7fcf76c56465b8f0ba776db7d1d.log"
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/-----59549f6c8-b8s6g_production_mongodb-22d35052f44b57e86f9f76a592cc22ef310b5383e63337d23cd7b8a499d858ef.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/-----569bcc575b-6hj78_pre-production_-----2edebf4dee033b76138a1079a24919e5e932f991153c23b94724b211f4c87a81.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/-----55bdd5685b-pbsh4_production_---------cad4cdc7bc10ad9656639a6ac3876923923d535e24cfbe29f941e9751c08ea99.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/----i-75cbc9968f-dqpbg_production_----542e3a83acc70f82dbbb7233d391c6cb8ea49a3782b768015e6efb2a9bc82378.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/prometheus-prometheus-kube-prometheus-prometheus-0_monitoring_config-reloader-6430be63a516cbbe42929c702ff7a59b8e58213b2e85b72bb0d9a4e0643a77b2.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/prometheus-prometheus-kube-prometheus-prometheus-0_monitoring_init-config-reloader-1f8ae540ab86ccee67986b064fe4a6de552e4721a8b48af657be3d7ab9f5a0a6.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/prometheus-prometheus-kube-prometheus-prometheus-0_monitoring_prometheus-5789d16f8781ca6a13a735ea5f6934b0e82d555f18b371231b59d74d50273b17.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/prometheus-prometheus-node-exporter-kmr8c_monitoring_node-exporter-4e1ecf0fb558ec4d9d9889040dfa77ffa807397c15683a4399838716376c7b74.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/redis-master-0_pre-production_redis-56ae5b94f96bc40e048b06452a63f0a135b6cde1a08d84e206d1affaa0d99e6c.log
2024-02-27 22:41:30 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/redis-master-0_production_redis-cae8d648741fc4735117e80b888d7811d3b4dfab7b3bee002daefefdea31a1a4.log
2024-02-27 22:41:30 +0000 [info]: #0 fluentd worker is now running worker=0
2024-02-27 22:41:31 +0000 [warn]: #0 no patterns matched tag="kubelet"
2024-02-28 01:16:53 +0000 [info]: [filter_kube_metadata] 410 Gone encountered. Restarting pod watch to reset resource versions.410 Gone
2024-02-28 04:18:06 +0000 [info]: [filter_kube_metadata] 410 Gone encountered. Restarting pod watch to reset resource versions.410 Gone
2024-02-28 08:22:20 +0000 [info]: [filter_kube_metadata] 410 Gone encountered. Restarting pod watch to reset resource versions.410 Gone
2024-02-29 01:46:35 +0000 [info]: [filter_kube_metadata] 410 Gone encountered. Restarting pod watch to reset resource versions.410 Gone
Additional context
kubernetes-fluentd-conf
AUTOMATICALLY GENERATED
DO NOT EDIT THIS FILE DIRECTLY, USE /templates/conf/kubernetes.conf.erb
<label @FLUENT_LOG> <match fluent.**> @type null @id ignore_fluent_logs
<filter kubernetes.var.log.containers.----.log kubernetes.var.log.containers.----.log kubernetes.var.log.containers.-----.log kubernetes.var.log.containers.-----**.log> @type kubernetes_metadata @id filter_kube_metadata kubernetes_url "#{ENV['FLUENT_FILTER_KUBERNETES_URL'] || 'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}" verify_ssl "#{ENV['KUBERNETES_VERIFY_SSL'] || true}" ca_file "#{ENV['KUBERNETES_CA_FILE']}" skip_labels "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_LABELS'] || 'false'}" skip_container_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_CONTAINER_METADATA'] || 'false'}" skip_master_url "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_MASTER_URL'] || 'false'}" skip_namespace_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_NAMESPACE_METADATA'] || 'false'}" watch "#{ENV['FLUENT_KUBERNETES_WATCH'] || 'true'}"