fluentd-kubernetes-daemonset icon indicating copy to clipboard operation
fluentd-kubernetes-daemonset copied to clipboard

`free(): invalid pointer` with latest fluent/fluentd-kubernetes-daemonset:v1-debian-forward-arm64 image

Open smparekh opened this issue 5 months ago • 4 comments

Describe the bug

Using the latest v1-debian-forward-arm64 image results in the container throwing free(): invalid pointer and constantly restarting leading to a node eviction

To Reproduce

I have provided a redacted config to reproduce

Expected behavior

Worker should comeup and stay up

Your Environment

- Tag of using fluentd-kubernetes-daemonset:v1-debian-forward-arm64

Your Configuration

@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
    @include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"
    @include conf.d/*.

    <label @FLUENT_LOG>
      <match fluent.**>
        @type null
        @id ignore_fluent_logs
      </match>
    </label>

    <match kubelet>
      @type null
    </match>

    <filter kubernetes.**>
      @type kubernetes_metadata
      @id filter_kube_metadata
      kubernetes_url "#{ENV['FLUENT_FILTER_KUBERNETES_URL'] || 'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
      verify_ssl "#{ENV['KUBERNETES_VERIFY_SSL'] || true}"
      ca_file "#{ENV['KUBERNETES_CA_FILE']}"
      skip_labels "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_LABELS'] || 'false'}"
      skip_container_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_CONTAINER_METADATA'] || 'false'}"
      skip_master_url "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_MASTER_URL'] || 'false'}"
      skip_namespace_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_NAMESPACE_METADATA'] || 'false'}"
      watch "#{ENV['FLUENT_KUBERNETES_WATCH'] || 'true'}"
    </filter>

    <source>
      @type tail
      @id in_tail_container_logs
      path "#{ENV['FLUENT_CONTAINER_TAIL_PATH'] || '/var/log/containers/*.log'}"
      pos_file "#{File.join('/var/log/', ENV.fetch('FLUENT_POS_EXTRA_DIR', ''), 'fluentd-containers.log.pos')}"
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}"
      </parse>
    </source>

    <filter qfunctions.**>
      @type record_transformer
      enable_ruby true
      <record>
        message ${record["message"].gsub(/^.*std(out|err):\s/, '')}
      </record>
    </filter>

    <filter qfunctions.**>
      @type parser
      format json
      key_name message
      emit_invalid_record_to_error false
    </filter>

    <match qfunctions.**>
      @type rewrite_tag_filter
      <rule>
        key tenant_id
        pattern /^abc1234$/
        tag abc1234
      </rule>
      <rule>
        key tenant_id
        pattern /.+/
        tag clear
      </rule>
    </match>
    <match abc1234.**>
      @type http
      @id out_abc1234
      @log_level info
      
      endpoint "#{ENV['ENDPOINT']}"
      http_method post
      content_type application/json
      json_array true
      <format>
        @type json
      </format>
      headers {"X-P-Stream": "functions", "X-P-Meta-Org-Id": "abc1234"}
      <auth>
        method basic
        username "#{ENV['USERNAME']}"
        password "#{ENV['PASSWORD']}"
      </auth>
    </match>

    <match clear>
      @type null
    </match>


### Your Error Log

```shell
2024-01-17 15:48:14 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-01-17 15:48:15 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-01-17 15:48:15 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-01-17 15:48:15 +0000 [info]: adding match pattern="kubelet" type="null"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="qfunctions.**" type="record_transformer"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="qfunctions.**" type="parser"
2024-01-17 15:48:15 +0000 [info]: adding match pattern="qfunctions.**" type="rewrite_tag_filter"
2024-01-17 15:48:15 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff7b7b91b8 @keys="tenant_id">, /^abc1234$/, "", "abc1234", nil]
2024-01-17 15:48:15 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff7b7b8790 @keys="tenant_id">, /.+/, "", "clear", nil]
2024-01-17 15:48:15 +0000 [info]: adding match pattern="abc1234.**" type="http"
2024-01-17 15:48:15 +0000 [warn]: #0 [out_abc1234] Status code 503 is going to be removed from default `retryable_response_codes` from fluentd v2. Please add it by yourself if you wish
2024-01-17 15:48:15 +0000 [info]: adding match pattern="clear" type="null"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="prometheus"
2024-01-17 15:48:15 +0000 [info]: adding source type="prometheus_output_monitor"
2024-01-17 15:48:15 +0000 [info]: adding source type="tail"
2024-01-17 15:48:15 +0000 [info]: #0 starting fluentd worker pid=361 ppid=6 worker=0
2024-01-17 15:48:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/contact-task-runtime-5cbd49696c-fmqkz_openfaas-fn_contact-task-runtime-90840620b3e6f1d26b85a666402b31aa3a5d5f9faf8f2388c919c87c5ce082a1.log
2024-01-17 15:48:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ground-task-runtime-65446d7bcc-527dl_openfaas-fn_ground-task-runtime-b12db1d88da3a582965a7ff372367d9676e9e640f505694022c6f5da97649e46.log
2024-01-17 15:48:15 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer
2024-01-17 15:48:17 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-01-17 15:48:18 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-01-17 15:48:18 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-01-17 15:48:18 +0000 [info]: adding match pattern="kubelet" type="null"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="qfunctions.**" type="record_transformer"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="qfunctions.**" type="parser"
2024-01-17 15:48:18 +0000 [info]: adding match pattern="qfunctions.**" type="rewrite_tag_filter"
2024-01-17 15:48:18 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff8cd245b0 @keys="tenant_id">, /^org_2Jf4UxF6FEwCMecX$/, "", "abc1234", nil]
2024-01-17 15:48:18 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff8cd23f98 @keys="tenant_id">, /.+/, "", "clear", nil]
2024-01-17 15:48:18 +0000 [info]: adding match pattern="abc1234.**" type="http"
2024-01-17 15:48:18 +0000 [warn]: #0 [out_abc1234] Status code 503 is going to be removed from default `retryable_response_codes` from fluentd v2. Please add it by yourself if you wish
2024-01-17 15:48:18 +0000 [info]: adding match pattern="clear" type="null"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="prometheus"
2024-01-17 15:48:18 +0000 [info]: adding source type="prometheus_output_monitor"
2024-01-17 15:48:18 +0000 [info]: adding source type="tail"
2024-01-17 15:48:18 +0000 [info]: #0 starting fluentd worker pid=376 ppid=6 worker=0
2024-01-17 15:48:18 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/contact-task-runtime-5cbd49696c-fmqkz_openfaas-fn_contact-task-runtime-90840620b3e6f1d26b85a666402b31aa3a5d5f9faf8f2388c919c87c5ce082a1.log
2024-01-17 15:48:18 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ground-task-runtime-65446d7bcc-527dl_openfaas-fn_ground-task-runtime-b12db1d88da3a582965a7ff372367d9676e9e640f505694022c6f5da97649e46.log
2024-01-17 15:48:18 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer

Additional context

we have a daemonset in a cluster running from about 22d ago where we are not seeing the invalid pointer issue

smparekh avatar Jan 17 '24 15:01 smparekh