fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

kubernetes filter plugin fails silently without attaching metadata

Open qingling128 opened this issue 5 years ago • 24 comments

Bug Report

Describe the bug In Kubernetes, we are using a Log Forwarder (one Fluent Bit per node) + Log Aggregator (a few Fluentd instances per cluster) infrastructure to collect logs.

We are seeing the following errors in the Fluentd log:

2019-06-22 21:56:46 +0000 [error]: #4 fluent/log.rb:362:error: unexpected error on reading data host="192.168.1.66" port=50828 error_class=NoMethodError error="undefined method `[]' for nil:NilClass"
  2019-06-22 21:56:46 +0000 [error]: #4 cool.io/io.rb:186:on_readable: suppressed same stacktrace
2019-06-22 21:56:55 +0000 [warn]: #4 fluent/log.rb:342:warn: emit transaction failed: error_class=NoMethodError error="undefined method `[]' for nil:NilClass" location="/opt/google-fluentd/embedded/lib/ru
by/gems/2.4.0/gems/fluent-plugin-record-modifier-2.0.1/lib/fluent/plugin/out_record_modifier.rb:208:in `expand'" tag="k8s_container.var.log.containers.stackdriver-metadata-agent-cluster-level-6dc44559d4-7
vr7b_kube-system_metadata-agent-446a1977023043116597f53d24c75c7d5a64d6d85ef6f279dc94c7b59c27eb4f.log"

After taking a closer look, it seems that the kubernetes filter plugin for Fluent Bit sometimes did not attach additional Kubernetes metadata fields as expected. As a result, record["kubernetes"]["namespace_name"] is not reliable to be present.

Expected behavior If the kubernetes filter plugin fails to query API to get metadata, it should either retry within the plugin, or error out so that the log entry can be put back in the queue and get re-processed again later. Right now, it seems to be failing silently and just passing the log record onto the forward output plugin without attaching the expected metadata.

Versions Fluent Bit v1.1.3 Fluentd v1.4.2 fluent-plugin-record-modifier Plugin v2.0.1 Kubernetes v1.12.7

Fluent Bit Configuration

    [INPUT]
        # https://docs.fluentbit.io/manual/input/tail
        Name               tail
        Tag                k8s_container.*
        Path               /var/log/containers/*.log
        Parser             docker
        DB                 /var/log/fluent-bit-k8s-container.db
        Buffer_Chunk_Size  512KB
        Buffer_Max_Size    5M
        Rotate_Wait        30
        Mem_Buf_Limit      30MB
        Skip_Long_Lines    On
        Refresh_Interval   10
        storage.type       filesystem

    [FILTER]
        # https://docs.fluentbit.io/manual/filter/kubernetes
        Name                kubernetes
        Match               k8s_container.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Annotations         Off

    [OUTPUT]
        # https://docs.fluentbit.io/manual/input/forward
        Name        forward
        Match       *
        Host        stackdriver-log-aggregator-in-forward.kube-system.svc.cluster.local
        Port        8989
        Retry_Limit False

Fluentd Configuration

    <source>
      @type forward
      port 8989
      bind 0.0.0.0
    </source>

    <match k8s_container.**>
      @type record_modifier
      <record>
        "logging.googleapis.com/local_resource_id" ${"k8s_container.#{record["kubernetes"]["namespace_name"]}.#{record["kubernetes"]["pod_name"]}.#{record["kubernetes"]["container_name"]}"}
        _dummy_labels_ ${if record['kubernetes'].has_key?('labels') && record['kubernetes']['labels'].is_a?(Hash); then; record["logging.googleapis.com/labels"] = record['kubernetes']['labels'].map{ |k, v| ["k8s-pod/#{k}", v]}.to_h; end; nil}
      </record>
      tag ${if record['stream'] == 'stderr' then 'stderr' else 'stdout' end}
      remove_keys kubernetes,_dummy_labels_
    </match>

    <system>
      workers 10
      root_dir /stackdriver-log-aggregator-persistent-volume
    </system>

    <match fluent.**>
      @type null
    </match>

   <match **>
      @type google_cloud
      @id google_cloud
       ...
    </match>

Any suggestion / workaround is welcome as well as long as there is a way for us to force it to retry and make sure the kubernetes metadata is always present.

qingling128 avatar Jun 23 '19 00:06 qingling128