beats icon indicating copy to clipboard operation
beats copied to clipboard

Implement default fallback option when using templates in autodiscover

Open exekias opened this issue 7 years ago • 31 comments

When defining templates in autodiscover, it would be nice to have a default fallback to use when none of them matches, something like this:

filebeat.autodiscover:
  providers:
   - type: docker
     templates:
       - condition:
           contains:
             docker.container.name: "nginx"
         config:
           - module: nginx
             access:
               prospector:
                 type: docker
                 containers.stream: stdout
                 container.ids:
                   - "${data.docker.container.id}"
     # Fallback to docker prospector (without parsing) for unknown containers:
     default:
       - type: docker
         container.ids:
           - "${data.docker.container.id}"

exekias avatar Jan 16 '18 11:01 exekias

👍 on this suggestion. An other option for the config would be to have it as following:

filebeat.autodiscover:
  providers:
   - type: docker
     templates:
       - default: true
         config:
           container.ids:
             - "${data.docker.container.id}"
       - condition:
           contains:
             docker.container.name: "nginx"
         config:
           - module: nginx
             access:
               prospector:
                 type: docker
                 containers.stream: stdout
                 container.ids:
                   - "${data.docker.container.id}"
  • I think there should be 1 default per type defined. This allows different defaults for each type and if type:docker is defined multiple times, it would allow one default for each. I think this could allow interesting combinations. I wonder if the indentation in your example above should have been to spaces to the left to be on the same level as providers?
  • The option above would allow to have 1 format for all config options and not have it in 2 places. I'm hoping that simplifies the code.
  • In the above case, there is no "global" fallback if someone uses multiple providers. Do we need that?

ruflin avatar Jan 16 '18 23:01 ruflin

Hi!

How would a condition.contains[] is met today?

If it mets only when all the sub-conditions in it turned true i.e. true && met(contains[0]) && met(contains[1]) && ... met(contains[n-1]), I'd be happy w/ multiple levels of defaultings to suffice my needs. For example, I'd want to write:

filebeat.autodiscover:
  providers:
   - type: docker
     templates:
       - # This specific type of containers/pods in this specific namespace has its own log format
         condition:
           contains:
             kubernetes.pod.name: "mixer"
             kubernets.namespace.name: "istio-system"
         config:
           - module: istio-mixer
              log:
                prospector:
                 type: docker
                 containers.stream: stdout
                 container.ids:
                   - "${data.docker.container.id}"
       - # This is the default for the "istio-system" namespace.
          # "Almost" all the containers/pods in this specific namespace would have an uniform log format
         condition:
           contains:
             kubernets.namespace: "istio-system"
         config:
           - module: istio
              log:
                prospector:
                 type: docker
                 containers.stream: stdout
                 container.ids:
                   - "${data.docker.container.id}"
       - # This is the default for our modern apps of `type: docker`. Assume it emits structured logs as the pod is annotated with its modernity.
         condition:
           anyof:
             kuberntes.pod.annotations:
               contains: "i-am-modern-ndjson-logging-app"
         config:
           - module: ndjson-logging-app
             log:
               prospector:
                 type: docker
                 containers.stream: stdout
                 container.ids:
                   - "${data.docker.container.id}"
       - # This is the global default per `type: docker`. We assume it emits non-structure logs
         config:
           - module: mylegacyapp
             log:
               prospector:
                 type: docker
                 containers.stream: stdout
                 container.ids:
                   - "${data.docker.container.id}"

mumoshu avatar Jan 18 '18 02:01 mumoshu

@mumoshu Have been trying to find documentation for the conditionals. Do all of these work?

rossedman avatar Mar 09 '18 12:03 rossedman

Hi @rossedman, the configuration seen in my above comment is just a suggestion! It isn't implemented as of today.

Does it look good to you? I had been willing to contribute a PR once I get some approval and/or support on the suggestion but was unable to do so due to.. the silence 😉

mumoshu avatar Mar 09 '18 13:03 mumoshu

Sorry for the late response @mumoshu, I missed your question while going over email :innocent:

The answer is yes, contains match all the fields in the given map :)

@rossedman you can find more info about conditions here: https://www.elastic.co/guide/en/beats/metricbeat/6.2/defining-processors.html#conditions

exekias avatar Mar 11 '18 23:03 exekias

Hi @exekias i'm trying to implement OR logic using multiple condition statements in filebeat.yml but it doesn't work.

       - condition:
           or:
              - contains:
                 docker.container.name: "image1"
              - contains:
                 docker.container.name: "image2"

Is there a way to achieve this without duplicating single condition?

       - condition:
           contains:
             docker.container.name: "image1"


       - condition:
           contains:
             docker.container.name: "image2"

mvasilenko avatar Apr 20 '18 10:04 mvasilenko

@mvasilenko I tested this and it worked for me: https://gist.github.com/exekias/e802ef376fdbbd4ba5872b57af4128bf

You may want to review your config, "image1" is repeated

exekias avatar Apr 20 '18 14:04 exekias

Good suggestion. Until something of this kind is implemented, as far as I understand, we are required to pervert like this:

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          templates:
            - condition:
                or:
                  - contains:
                      kubernetes.pod.name: internal-api
                  - contains:
                      kubernetes.pod.name: customer-api
                  - contains:
                      kubernetes.pod.name: exporter
              config:
                - type: docker
                  containers.ids:
                    - "${data.kubernetes.container.id}"
                  multiline.pattern: '^[[:space:]]+((at|\.{3})\b|^Caused by:)?'
                  multiline.negate: false
                  multiline.match: after
            - condition:
                and:
                  - not:
                      contains:
                        kubernetes.pod.name: internal-api
                  - not:
                      contains:
                        kubernetes.pod.name: customer-api
                  - not:
                      contains:
                        kubernetes.pod.name: exporter
              config:
                - type: docker
                  containers.ids:
                    - "${data.kubernetes.container.id}"

bagratte avatar Aug 07 '18 10:08 bagratte

https://github.com/elastic/beats/pull/9029 was just merged, which brings the ability to define a configuration without conditions. Conditions are matched in order, so if you put this one at the end it will act as the default, as I think it solves this issue, I'm proceeding to close it.

This is an example config that will be possible with this change:

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          templates:
            - condition:
                or:
                  - contains:
                      kubernetes.pod.name: internal-api
                  - contains:
                      kubernetes.pod.name: customer-api
                  - contains:
                      kubernetes.pod.name: exporter
              config:
                - type: docker
                  containers.ids:
                    - "${data.kubernetes.container.id}"
                  multiline.pattern: '^[[:space:]]+((at|\.{3})\b|^Caused by:)?'
                  multiline.negate: false
                  multiline.match: after
            # Default:
            - config:
                - type: docker
                  containers.ids:
                    - "${data.kubernetes.container.id}"

exekias avatar Nov 29 '18 17:11 exekias

I tried that configuration with docker containers only and it fails with parsing docker log files multiple times.

Consider the following config:

filebeat.autodiscover:
  providers:
    # Provider for our docker containers
    - type: docker
      templates:
        # Template for the spring boot json logging containers
        - condition:
            contains:
              docker.container.image: myuser/myimage
          config:
            - type: docker
              containers:
                ids:
                  - ${data.docker.container.id}
              encoding: utf-8
              json:
                keys_under_root: true
                add_error_key: true
                message_key: "message"
                overwrite_keys: true
                match: after
              fields:
                log.format.content: "json"
                log.format.layout: "spring-boot"
        - condition:
          config:
            - type: docker
              containers:
                ids:
                  - ${data.docker.container.id}
              encoding: utf-8
              fields:
                log.format.content: "plain"
                log.format.layout: "spring"

When I check this configuration with filebeat 6.6.2, it tells me that the config is ok. When I start this configuration, my expected behavior is:

  1. The log from the container myuser/myimage is using the template for json logging
  2. All other containers are using the default template

What happens is:

  1. The log from the container myuser/myimage is harvested using the given template
  2. The log of all containers including myuser/myimage is harvested using the default template.

Therefore the logs for the container using image myuser/myimage is harvested twice.

Is this the expected behavior? And if so, can the second log stream be suppressed in any way?

FSeidinger avatar Mar 22 '19 11:03 FSeidinger

Same here. I don't think this is expected behavior.

lukyanov avatar Mar 23 '19 10:03 lukyanov

I tried to use default condition also with filebeat 6.6.2. My config is as follow :

autodiscover:
        providers:
          - type: kubernetes
            templates:
              - condition:
                  equals:
                    kubernetes.labels.type: java
                config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2} ' 
                    multiline.negate: true 
                    multiline.match: after
                    ignore_older: 48h
              - condition:
                  contains:
                    kubernetes.labels.app: nginx
                config:
                  - module: nginx
                    access:
                      input:
                        type: docker
                        containers.stream: stdout
                        containers.ids:
                          - "${data.kubernetes.container.id}"
                        ignore_older: 48h
                    error:
                      input:
                        type: docker
                        containers.stream: stderr
                        containers.ids:
                          - "${data.kubernetes.container.id}"
                        ignore_older: 48h
              - config:
                  - type: docker
                    containers.ids:
                      - "${data.kubernetes.container.id}"
                    ignore_older: 48h

The result is that my logs are stored twice. For example, for java apps which have kubernetes.labels.type: java I have a multiline doc with parsed fields (loglevel, class, etc) AND a raw doc without these fields.

olivierboudet avatar Jul 18 '19 10:07 olivierboudet

Simple example to reproduce : https://gist.github.com/olivierboudet/796a240577ea7f9fcb5a6f25a7114e6c In the configuration, I added a condition to ignore all logs from filebeat container. If I uncomment the last part (ie. the default config), all logs are processed, included those from filebeat container.

olivierboudet avatar Jul 18 '19 12:07 olivierboudet

Any news on this? Does the unconditional config work or not?

willemdh avatar Oct 07 '19 10:10 willemdh

Sorry for the late response. We released default_config setting for hints based autodiscover, check how it works here: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover-hints.html#_kubernetes_2

Check hints.default_config setting. I would say it fulfills what we are seeking with this issue

exekias avatar Oct 07 '19 13:10 exekias

@exekias Using this currently:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      in_cluster: false
      host: ${HOSTNAME}
      kube_config: /home/svcaccount/.kube/config
      hints_enabled: true
      templates:
        - condition:
            contains:
              kubernetes.container.image: redis
          config:
            - module: redis
              log:
                enabled: true
                input:
                  type: container
                  paths:
                    - /var/lib/docker/containers/${data.kubernetes.container.id}/*.log
              slowlog:
                enabled: false
        - config:
            - type: container
              paths: ["/var/lib/docker/containers/${data.kubernetes.container.id}/*.log"]
              exclude_lines: ["^\\s+[\\-`('.|_]"]  # drop asciiart lines

Which seems to work without using hints.default_config. I did not immediately found any duplicates. As I'm actually not really using hints, should I still put:

        - config:
            - type: container
              paths: ["/var/lib/docker/containers/${data.kubernetes.container.id}/*.log"]
              exclude_lines: ["^\\s+[\\-`('.|_]"]  # drop asciiart lines

under hints.default_config: ?

willemdh avatar Oct 07 '19 13:10 willemdh

That's interesting. Some users reported these settings were failing for them. Do you have any redis container running? We would need to check that it falls under the redis condition and the default settings are not launched for it

exekias avatar Oct 09 '19 08:10 exekias

Sorry for the late response. We released default_config setting for hints based autodiscover, check how it works here: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover-hints.html#_kubernetes_2

Check hints.default_config setting. I would say it fulfills what we are seeking with this issue

Would you please elaborate, I couldn't get it works :(

speechkey avatar Nov 18 '19 11:11 speechkey

Using Filebeat v7.4 I'm getting duplicates with the config below. There is a message containing the decoded JSON fields and another that apparently used the default config and did not decode the JSON. It seems like the default config acts as a "finally" step and is applied whether or not any condition was true for a given container.

filebeat.autodiscover:
  providers:
    - type: docker
      templates:
        - condition:
            equals:
              container.image.name: "mycomponent"
          config:
            - type: container
              paths:
                - /usr/share/filebeat/dockerlogs/${data.docker.container.id}/*.log
              processors:
                - decode_json_fields:
                    fields: ["message"]
                    target: "json"
        - config:
            - type: container
              paths:
                - /usr/share/filebeat/dockerlogs/${data.docker.container.id}/*.log

HGS-mbayer avatar Nov 18 '19 14:11 HGS-mbayer

Is there anyone still working on this? I'm running into the same issue.

edit: running 7.5.2

futurekill avatar Feb 07 '20 15:02 futurekill

Is there anyone still working on this? I'm running into the same issue.

edit: running 7.5.2

Ditto. I'm not familiar with the Beats code, but I was looking at the pull request that made conditions in autodiscover optional, https://github.com/elastic/beats/pull/9029/files, and the template.GetConfig method seems to be applying the null condition config regardless of whether another condition is met. I thought the null condition config should only be applied if no other condition were met.

carlsoane avatar Feb 11 '20 22:02 carlsoane

Stil exists with filebeat 7.6.1

olivierboudet avatar Mar 20 '20 20:03 olivierboudet

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

botelastic[bot] avatar Feb 18 '21 21:02 botelastic[bot]

I am trying to do some workaround with appenders

    filebeat.autodiscover:
      providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints.enabled: true
          hints.default_config.enabled: false
          templates:
            - condition.contains:
                kubernetes.labels.sbkslog: "true"
              config:
                - type: container
                  paths:
                    - /var/log/containers/*${data.kubernetes.container.id}.log
                  add_kubernetes_metadata:
                    host: ${NODE_NAME}
                    matchers:
                    - logs_path:
                        logs_path: "/var/log/containers/"
                  processors:
                    - add_fields:
                        fields:
                          fb_conf: "sbkslog"
          appenders:
            - type: config
              config:
                - type: container
                  processors:
                    - add_fields:
                        fields:
                          fb_conf: "default"
                  paths:
                    - /var/log/containers/*${data.kubernetes.container.id}.log
                  add_kubernetes_metadata:
                    host: ${NODE_NAME}
                    matchers:
                    - logs_path:
                        logs_path: "/var/log/containers/"

Using hints.default_config is not an option because I need to have hints.default_config.enabled: false to disable logging all of containers, I want to collect logs just from containers with hints annotation set to enabled and than use some conditions with DEFAULT option.

When I remove that appenders section I still get logs from containers that does not match first condition in templates section, but where is defined configuration for them? How filebeat knows where to collect logs?

petak-it avatar Feb 22 '21 10:02 petak-it

Are there any updates on that? Does not work on 7.6.2

Kosmonafft avatar Jun 17 '21 20:06 Kosmonafft

How filebeat knows where to collect logs?

We've been trying to figure this out as well. We're seeing filebeat output this on debug, for our auto discovered non-default_config'd nginx annotated pod:

{
  "access": {
    "enabled": true,
    "input": {
      "paths": [
        "/var/lib/docker/containers/769cc9f43a9f7eec58a23762a746c92130c8dee7cad4d353dfd7e89e1ac2711f/*-json.log"
      ],
      "stream": "stdout",
      "type": "container"
    }
  },
  ...
  "module": "nginx"
}

but that will not work because it's a containerd based kubernetes cluster and not docker. We can't seem to figure out how to tell filebeat to instead use /var/log/containers/*${data.kubernetes.container.id}.log for those auto discovered and discretely enabled pods.

maybe @exekias can point us in the right direction?

jam01 avatar Jul 28 '21 17:07 jam01

To answer my own question... I decided to dig into the code and found this https://github.com/elastic/beats/blob/a4e5a73af1df3020b5d50d5a198963bd66e5c370/filebeat/autodiscover/builder/hints/logs.go, which led me to trying this

      autodiscover:
        providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints:
            enabled: true
            default_config:
              enabled: false
              type: container
              paths:
              - /var/log/containers/*-${data.kubernetes.container.id}.log

which works. The default_config is applied to the events found through autodiscover hints.

jam01 avatar Jul 28 '21 18:07 jam01

To answer my own question... I decided to dig into the code and found this https://github.com/elastic/beats/blob/a4e5a73af1df3020b5d50d5a198963bd66e5c370/filebeat/autodiscover/builder/hints/logs.go, which led me to trying this

      autodiscover:
        providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints:
            enabled: true
            default_config:
              enabled: false
              type: container
              paths:
              - /var/log/containers/*-${data.kubernetes.container.id}.log

which works. The default_config is applied to the events found through autodiscover hints.

The problem is now that it looks like you can only provide exactly one default config if you are using hint based autodiscover (at least I couldn't figure it out) You cannot say: "For these logs I want to use this default config if no hints are provided and for these logs I want to use another default config.

Kosmonafft avatar Jul 29 '21 07:07 Kosmonafft

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

botelastic[bot] avatar Jul 29 '22 08:07 botelastic[bot]

Hi, I have been facing similar issue. Here is my post: https://discuss.elastic.co/t/filebeat-not-logging-hints-enabled-container-logs/310574

Basically, I'm trying to get logs from nginx ingress controller, which is working fine but I can't get hints to work. I would really appreciate if someone got it working and post the solution either to my post or this post.

Below is my filebeat.yaml file content:

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
  namespace: search
spec:
  type: filebeat
  version: 7.12.1
  elasticsearchRef:
    name: elastic-search
  kibanaRef:
    name: kibana-web
  config:
    filebeat.autodiscover.providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints: 
        enabled: true
        #add_resource_metadata.namespace.enabled: true
        default_config: 
          enabled: false
          type: container
          paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
      templates:
      - condition:
          contains: 
            kubernetes.pod.name: ingress
        config:
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container
          
    processors:
    - add_cloud_metadata: {}
    - add_host_metadata: {}
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        dnsPolicy: ClusterFirstWithHostNet
        #hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            #privileged: true
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          resources:
            requests:
              memory: 200Mi
              cpu: 0.2
            limits:
              memory: 300Mi
              cpu: 0.4
              
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers

opensourcedev1 avatar Jul 30 '22 00:07 opensourcedev1