fluentd-kubernetes-daemonset icon indicating copy to clipboard operation
fluentd-kubernetes-daemonset copied to clipboard

v1.2-debian-cloudwatch crashes where v0.12 works (k8s 1.8.13 CoreOS 17455.0)

Open whereisaaron opened this issue 7 years ago • 1 comments

Deploying for Kubernetes 1.8.13 on CoreOS 1745.5.0 using fluent/fluentd-kubernetes-daemonset

Deploying with v0.12-debian-cloudwatch works great as in the past, however switching to v1.2-debian-cloudwatch and every Pod on every node crash after ~1 minute of run time. Occasionally they get to create a log-flow and even log some entries first, but they always crash. The kept getting restarted but they just crash again. They keep in time too, so after a while they all have exactly e.g. 12 crashes, so I am guess they run the same amount of time before crashing.

Everything else about the config remains unchanged. I wondered if Debian needed more memory so I removed that limit, but an every node in the cluster the container would still run for maybe a minute and then crash.

2018-06-15 21:38:39 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2018-06-15 21:38:46 +0000 [info]: using configuration file: <ROOT>
  <match fluent.**>
    @type null
  </match>
  <source>
    @type tail
    path "/var/log/containers/*.log"
    pos_file "/var/log/fluentd-containers.log.pos"
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    tag "kubernetes.*"
    format json
    read_from_head true
    <parse>
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      @type json
      time_type string
    </parse>
  </source>
  <filter kubernetes.**>
    @type kubernetes_metadata
  </filter>
  <filter kubernetes.**>
    @type record_transformer
    enable_ruby true
    <record>
      kubehost ${record.fetch("kubernetes", Hash.new).fetch("host", "unknown_host")}
    </record>
  </filter>
  <match kubernetes.**>
    @type cloudwatch_logs
    log_group_name "anthill-cluster-containers"
    log_stream_name_key "kubehost"
    remove_log_group_name_key true
    auto_create_stream true
    put_log_events_retry_limit 20
  </match>
</ROOT>
2018-06-15 21:38:46 +0000 [info]: starting fluentd-1.2.2 pid=5 ruby="2.3.3"
2018-06-15 21:38:46 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.3.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "--under-supervisor"]
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-cloudwatch-logs' version '0.5.0'
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.1.2'
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.1'
2018-06-15 21:38:50 +0000 [info]: gem 'fluentd' version '1.2.2'
2018-06-15 21:38:50 +0000 [info]: adding match pattern="fluent.**" type="null"
2018-06-15 21:38:50 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2018-06-15 21:38:54 +0000 [info]: adding filter pattern="kubernetes.**" type="record_transformer"
2018-06-15 21:38:54 +0000 [info]: adding match pattern="kubernetes.**" type="cloudwatch_logs"
2018-06-15 21:38:57 +0000 [info]: adding source type="tail"
2018-06-15 21:38:57 +0000 [info]: #0 starting fluentd worker pid=16 ppid=5 worker=0
2018-06-15 21:38:57 +0000 [info]: #0 following tail of /var/log/containers/kube-prometheus-exporter-node-fwnkt_prometheus_node-exporter-1412af047f962327fb4e3f7949fac5028ae156606e68d064240a78d37fd8af65.log
2018-06-15 21:38:57 +0000 [info]: #0 following tail of /var/log/containers/kube-node-drainer-ds-bghgj_kube-system_main-7a733ef08fe677ea9c3998026c6e3149b30ffbf031c9ddfba8450dcb9ce8dae6.log
2018-06-15 21:38:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.

My config:

  <match fluent.**>
    @type null
  </match>

  <source>
    @type tail
    path /var/log/containers/*.log
    pos_file /var/log/fluentd-containers.log.pos
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    tag kubernetes.*
    format json
    read_from_head true
  </source>

  <filter kubernetes.**>
    @type kubernetes_metadata
  </filter>

  <filter kubernetes.**>
    @type record_transformer
    enable_ruby true
    <record>
      kubehost ${record.fetch("kubernetes", Hash.new).fetch("host", "unknown_host")}
    </record>
  </filter>

  <match kubernetes.**>
    @type cloudwatch_logs
    log_group_name "#{ENV['LOG_GROUP_NAME']}"
    log_stream_name_key kubehost
    remove_log_group_name_key true
    auto_create_stream true
    put_log_events_retry_limit 20
  </match>

whereisaaron avatar Jun 15 '18 22:06 whereisaaron

Does anyone have an good idea for this issue? Vanilla fluentd v1.2 doesn't have this issue so we want to know what is the problem.

repeatedly avatar Jul 06 '18 23:07 repeatedly