fluentd-output-sumologic Kubernetes metadata no longer visible to plugin

Before upgrading to 1.4.1, we used to dynamically set our Sumo source/host/category based on the K8s metadata, as follows:

<match **> @type sumologic endpoint https://endpoint1.collection.us2.sumologic.com/receiver/v1/http/XXXX log_format json source_category ${record['kubernetes']['namespace_name']} source_name ${record['kubernetes']['container_name']} source_host ${record['kubernetes']['pod_name']} open_timeout 10

With 1.4.1, we're getting the above hardcoded string values (i.e...)instead of the dynamic K8s metadata. I've noticed that I'm still able to access the tag values by using something like ${tag[n]}, for instance (although it used to be tag_parts[n], but that no longer works either). Is this intentional, expected, or am I doing something wrong?

Jun 20 '19 21:06 flynnecf2

@flynnecf2 any reason you are not using our Kubernetes FluentD Plugin to send the data? It offers this same functionality.

https://github.com/SumoLogic/fluentd-kubernetes-sumologic

Jun 27 '19 03:06 frankreno

Also what version were you on before this so we can chase this down...

Jun 27 '19 03:06 frankreno

I'm running into the same issue. @frankreno the fluentd plugin has now been deprecated. Is there any workaround on this? Cannot use https://github.com/SumoLogic/sumologic-kubernetes-collection

Feb 09 '21 21:02 malcolmrebughini

@malcolmrebughini - Can you share why cannot you not use the new collection you linked to?

Feb 09 '21 23:02 frankreno

@frankreno it is an existing fluentd configuration and I would prefer to keep changes on the cluster to a minimum.

Feb 10 '21 01:02 malcolmrebughini

Can you please share your config? Can take a look and see what I can determine.

Unfortunately we have no plans to modify this plugin to support the kind of dynamic generation at this moment. We would of course welcome a PR.

Our new collection process preserves the metadata of course and sends the data via HTTP header instead of in the log line which is also more cost effective on the bytes ingested and has many other benefits. I would definitely recommend upgrading it when you can as it is now the supported solution.

Feb 10 '21 02:02 frankreno

Here's the config:

<system>
      log_level debug
    </system>


    <source>
      @type tail
      @label @containers
      path /var/log/containers/*.log
      exclude_path ["/var/log/containers/cloudwatch-agent*", "/var/log/containers/fluentd*", "/var/log/containers/kube*", "/var/log/containers/monitoring*", "/var/log/containers/calico*", ]
      pos_file /var/log/td-agent/fluentd-docker.pos
      tag core.*
      read_from_head false
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <label @containers>
      <filter core.**>
        @type kubernetes_metadata
        @log_level debug
        annotation_match [".*"]
        de_dot false
        tag_to_kubernetes_name_regexp ".+?\\.containers\\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\\.log$"
        container_name_to_kubernetes_regexp "^(?<name_prefix>[^_]+)_(?<container_name>[^\\._]+)(\\.(?<container_hash>[^_]+))?_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[^_]+$"
      </filter>

      <match core.**>
        @type copy
        <store>
          @type sumologic
          @log_level debug
          endpoint "#{ENV['SUMOLOGIC_ENDPOINT']}"
          log_format json_merge
          log_key log
          source_category "#{ENV['ENV']}/core/${record['kubernetes']['container_name']}"
          source_name ${record['kubernetes']['labels']['app']}
          open_timeout 10
        </store>
        <store>
            ...s3 config
        </store>
      </match>
    </label>

Feb 10 '21 02:02 malcolmrebughini

I've found the code change that cause this.

In 1.4.0 there was a function called expand_param that looked for record. this was replaced in 1.4.1 with extract_placeholders. Not very familiar with ruby so not sure where that function comes from. Seems to be from fluentd itself?

Feb 10 '21 02:02 malcolmrebughini

After digging a bit, in newer versions of the fluentd api the proper way of doing this is adding a buffer and then referencing the chunk_key as $.path.to.something:

<match rewrite.**>
        @type copy
        <store>
          @type sumologic
          @log_level debug
          endpoint "#{ENV['SUMOLOGIC_ENDPOINT']}"
          log_key log
          source_category "#{ENV['ENV']}/core/${$.kubernetes.container_name}"
          source_name TESTING
          source_host ${$.kubernetes.pod_name}
          open_timeout 10

          <buffer $.kubernetes.container_name, $.kubernetes.pod_name>
            @type memory
          </buffer>
        </store>
      </match>

I think this solves the issue. So feel free to close this. (And sorry for resurrecting such an old issue)

Feb 10 '21 16:02 malcolmrebughini

No sorry needed, glad you were able to get this to work. I do hope you will look at updating to the newer supported collection method in the future.

Feb 10 '21 16:02 frankreno

fluentd-output-sumologic fluentd-output-sumologic copied to clipboard

Kubernetes metadata no longer visible to plugin

fluentd-output-sumologic
fluentd-output-sumologic copied to clipboard