v1.2-debian-cloudwatch crashes where v0.12 works (k8s 1.8.13 CoreOS 17455.0)
Deploying for Kubernetes 1.8.13 on CoreOS 1745.5.0 using fluent/fluentd-kubernetes-daemonset
Deploying with v0.12-debian-cloudwatch works great as in the past, however switching to v1.2-debian-cloudwatch and every Pod on every node crash after ~1 minute of run time. Occasionally they get to create a log-flow and even log some entries first, but they always crash. The kept getting restarted but they just crash again. They keep in time too, so after a while they all have exactly e.g. 12 crashes, so I am guess they run the same amount of time before crashing.
Everything else about the config remains unchanged. I wondered if Debian needed more memory so I removed that limit, but an every node in the cluster the container would still run for maybe a minute and then crash.
2018-06-15 21:38:39 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2018-06-15 21:38:46 +0000 [info]: using configuration file: <ROOT>
<match fluent.**>
@type null
</match>
<source>
@type tail
path "/var/log/containers/*.log"
pos_file "/var/log/fluentd-containers.log.pos"
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag "kubernetes.*"
format json
read_from_head true
<parse>
time_format %Y-%m-%dT%H:%M:%S.%NZ
@type json
time_type string
</parse>
</source>
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<filter kubernetes.**>
@type record_transformer
enable_ruby true
<record>
kubehost ${record.fetch("kubernetes", Hash.new).fetch("host", "unknown_host")}
</record>
</filter>
<match kubernetes.**>
@type cloudwatch_logs
log_group_name "anthill-cluster-containers"
log_stream_name_key "kubehost"
remove_log_group_name_key true
auto_create_stream true
put_log_events_retry_limit 20
</match>
</ROOT>
2018-06-15 21:38:46 +0000 [info]: starting fluentd-1.2.2 pid=5 ruby="2.3.3"
2018-06-15 21:38:46 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/fluentd/vendor/bundle/ruby/2.3.0/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--gemfile", "/fluentd/Gemfile", "--under-supervisor"]
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-cloudwatch-logs' version '0.5.0'
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.1.2'
2018-06-15 21:38:50 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.1'
2018-06-15 21:38:50 +0000 [info]: gem 'fluentd' version '1.2.2'
2018-06-15 21:38:50 +0000 [info]: adding match pattern="fluent.**" type="null"
2018-06-15 21:38:50 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2018-06-15 21:38:54 +0000 [info]: adding filter pattern="kubernetes.**" type="record_transformer"
2018-06-15 21:38:54 +0000 [info]: adding match pattern="kubernetes.**" type="cloudwatch_logs"
2018-06-15 21:38:57 +0000 [info]: adding source type="tail"
2018-06-15 21:38:57 +0000 [info]: #0 starting fluentd worker pid=16 ppid=5 worker=0
2018-06-15 21:38:57 +0000 [info]: #0 following tail of /var/log/containers/kube-prometheus-exporter-node-fwnkt_prometheus_node-exporter-1412af047f962327fb4e3f7949fac5028ae156606e68d064240a78d37fd8af65.log
2018-06-15 21:38:57 +0000 [info]: #0 following tail of /var/log/containers/kube-node-drainer-ds-bghgj_kube-system_main-7a733ef08fe677ea9c3998026c6e3149b30ffbf031c9ddfba8450dcb9ce8dae6.log
2018-06-15 21:38:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::Plugin::KubernetesMetadataFilter, Fluent::Plugin::RecordTransformerFilter] uses `#filter_stream` method.
My config:
<match fluent.**>
@type null
</match>
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag kubernetes.*
format json
read_from_head true
</source>
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<filter kubernetes.**>
@type record_transformer
enable_ruby true
<record>
kubehost ${record.fetch("kubernetes", Hash.new).fetch("host", "unknown_host")}
</record>
</filter>
<match kubernetes.**>
@type cloudwatch_logs
log_group_name "#{ENV['LOG_GROUP_NAME']}"
log_stream_name_key kubehost
remove_log_group_name_key true
auto_create_stream true
put_log_events_retry_limit 20
</match>
Does anyone have an good idea for this issue? Vanilla fluentd v1.2 doesn't have this issue so we want to know what is the problem.