fluent-bit
fluent-bit copied to clipboard
kubernetes filter plugin fails silently without attaching metadata
Bug Report
Describe the bug In Kubernetes, we are using a Log Forwarder (one Fluent Bit per node) + Log Aggregator (a few Fluentd instances per cluster) infrastructure to collect logs.
We are seeing the following errors in the Fluentd log:
2019-06-22 21:56:46 +0000 [error]: #4 fluent/log.rb:362:error: unexpected error on reading data host="192.168.1.66" port=50828 error_class=NoMethodError error="undefined method `[]' for nil:NilClass"
2019-06-22 21:56:46 +0000 [error]: #4 cool.io/io.rb:186:on_readable: suppressed same stacktrace
2019-06-22 21:56:55 +0000 [warn]: #4 fluent/log.rb:342:warn: emit transaction failed: error_class=NoMethodError error="undefined method `[]' for nil:NilClass" location="/opt/google-fluentd/embedded/lib/ru
by/gems/2.4.0/gems/fluent-plugin-record-modifier-2.0.1/lib/fluent/plugin/out_record_modifier.rb:208:in `expand'" tag="k8s_container.var.log.containers.stackdriver-metadata-agent-cluster-level-6dc44559d4-7
vr7b_kube-system_metadata-agent-446a1977023043116597f53d24c75c7d5a64d6d85ef6f279dc94c7b59c27eb4f.log"
After taking a closer look, it seems that the kubernetes
filter plugin for Fluent Bit sometimes did not attach additional Kubernetes metadata fields as expected. As a result, record["kubernetes"]["namespace_name"]
is not reliable to be present.
Expected behavior
If the kubernetes
filter plugin fails to query API to get metadata, it should either retry within the plugin, or error out so that the log entry can be put back in the queue and get re-processed again later. Right now, it seems to be failing silently and just passing the log record onto the forward
output plugin without attaching the expected metadata.
Versions Fluent Bit v1.1.3 Fluentd v1.4.2 fluent-plugin-record-modifier Plugin v2.0.1 Kubernetes v1.12.7
Fluent Bit Configuration
[INPUT]
# https://docs.fluentbit.io/manual/input/tail
Name tail
Tag k8s_container.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/fluent-bit-k8s-container.db
Buffer_Chunk_Size 512KB
Buffer_Max_Size 5M
Rotate_Wait 30
Mem_Buf_Limit 30MB
Skip_Long_Lines On
Refresh_Interval 10
storage.type filesystem
[FILTER]
# https://docs.fluentbit.io/manual/filter/kubernetes
Name kubernetes
Match k8s_container.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Annotations Off
[OUTPUT]
# https://docs.fluentbit.io/manual/input/forward
Name forward
Match *
Host stackdriver-log-aggregator-in-forward.kube-system.svc.cluster.local
Port 8989
Retry_Limit False
Fluentd Configuration
<source>
@type forward
port 8989
bind 0.0.0.0
</source>
<match k8s_container.**>
@type record_modifier
<record>
"logging.googleapis.com/local_resource_id" ${"k8s_container.#{record["kubernetes"]["namespace_name"]}.#{record["kubernetes"]["pod_name"]}.#{record["kubernetes"]["container_name"]}"}
_dummy_labels_ ${if record['kubernetes'].has_key?('labels') && record['kubernetes']['labels'].is_a?(Hash); then; record["logging.googleapis.com/labels"] = record['kubernetes']['labels'].map{ |k, v| ["k8s-pod/#{k}", v]}.to_h; end; nil}
</record>
tag ${if record['stream'] == 'stderr' then 'stderr' else 'stdout' end}
remove_keys kubernetes,_dummy_labels_
</match>
<system>
workers 10
root_dir /stackdriver-log-aggregator-persistent-volume
</system>
<match fluent.**>
@type null
</match>
<match **>
@type google_cloud
@id google_cloud
...
</match>
Any suggestion / workaround is welcome as well as long as there is a way for us to force it to retry and make sure the kubernetes
metadata is always present.