fluent-plugin-multi-format-parser icon indicating copy to clipboard operation
fluent-plugin-multi-format-parser copied to clipboard

Use time_format and types on multi_format parse

Open carlosedp opened this issue 6 years ago • 8 comments

I currently parse one tag with a simple filter but I need to add a different pattern to the filter and I'm planning to migrate to a multi_format plugin.

My question is if it's possible to use the time_format and types fields on each the pattern like this:

<filter kubernetes.var.log.containers.traefik-ingress-**.log>
  @type parser
  key_name log
  reserve_data yes
  <parse>
    @type multi_format
    <pattern>
      format regexp
      expression /^(?<ip>[^-]*) - - \[(?<datetime>[^\]]*)\] "(?<method>[^ ]*) (?<path>[^ ]*) (?<http_version>[^"]*)" (?<status_code>[^ ]*) (?<body_bytes>[^ ]*) "(?<referer>[^"]*)" "(?<user_agent>[^"]*)" (?<seq>[^ ]*) "(?<domain>[^ ]*)" "(?<dest_url>[^ ]*)" (?<response_time>[^ ms]*)/
      time_key datetime
      time_format %d/%b/%Y:%H:%M:%S %z
      types status_code:integer,body_bytes:integer,seq:integer,response_time:integer
    </pattern>
    <pattern>
      format regexp
      expression /^\[(?<datetime>[^\]]*)\] - (?<data>[^ ]*)/
    </pattern>
  </parse>
</filter>

carlosedp avatar Apr 24 '18 22:04 carlosedp

Probably not cause I get this when I used your example:

2019-02-15 09:35:16 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not match with data 'time=\"2019-02-15T09:35:16Z\" level=debug msg=\"Skipping Kubernetes event kind *v1.Endpoints\"\n'" location=nil tag="kubernetes.var.log.containers.traefik-ingress-controller-d2bv5_kube-system_traefik-ingress-lb-3f5d86da4a51d9909d67acae3fa9c8ddf1ba9f83f6b4ece55728a53b41175bae.log" time=2019-02-15 09:35:16.418231659 +0000 record={"log"=>"time=\"2019-02-15T09:35:16Z\" level=debug msg=\"Skipping Kubernetes event kind *v1.Endpoints\"\n", "stream"=>"stdout", "docker"=>{"container_id"=>"3f5d86da4a51d9909d67acae3fa9c8ddf1ba9f83f6b4ece55728a53b41175bae"}, "kubernetes"=>{"container_name"=>"traefik-ingress-lb", "namespace_name"=>"kube-system", "pod_name"=>"traefik-ingress-controller-d2bv5", "container_image"=>"traefik:latest", "container_image_id"=>"docker-pullable://traefik@sha256:79a9b27986068895c5deb438099fbd3072ed645cdcabc72af24e229f868c4cf2", "pod_id"=>"101009d3-2c79-11e9-b433-de1a34070007", "labels"=>{"controller-revision-hash"=>"696f6f7df", "k8s-app"=>"traefik-ingress-lb", "name"=>"traefik-ingress-lb", "pod-template-generation"=>"1"}, "host"=>"kube-master1", "master_url"=>"https://10.96.0.1:443/api", "namespace_id"=>"9e8bebb0-11a2-11e9-9ca2-de1a34070007"}}

sokoow avatar Feb 15 '19 09:02 sokoow

So please, is someone knows where we should add the types field because I've tried multiple places and nothing happened : all my field stay on string types :-/

nargmarg avatar May 26 '20 15:05 nargmarg

@nargmarg If you have a problem, need to write configuration and actual log example. We are not psychic, so hard to reply the answer without the information :)

repeatedly avatar May 26 '20 18:05 repeatedly

You're right, sorry. Below you will find one part of my fluentd config (no outut part but no need to share that) :

<filter kubernetes.**>
   @id filter_parser
   @type parser
   key_name log
   reserve_time true
   reserve_data true
   remove_key_name_field true
   <parse>
     @type multi_format
       <pattern>
           format /(?<web.remote_addr>[^ ]*) - - \[(?<web.time>[^\]]*)\] "(?<web.method>\S+)(?: +(? 
        <web.request>[^\"]*) +\S*)?" (?<web.code>[^ ]*) (?<web.size>[^ ]*) "(?<web.referer>[^\"]*)" "(? 
        <web.agent>[^\"]*)" (?<web.request_length>[^ ]*) (?<web.request_time>[^ ]*) \[(? 
        <web.namespace_service_port>[^ ]*)\] (?<web.upstream_addr>[^ ]*) (? 
        <web.upstream_response_length>[^ ]*) (?<web.upstream_response_time>[^ ]*) (? 
       <web.upstream_status>[^ ]*)/
         types web.code:integer,web.size:integer,web.request_length:integer,
         web.request_time:float,web.upstream_addr:array,web.upstream_response_length:integer,
         web.upstream_response_time:float,web.upstream_status:integer
       </pattern>
       <pattern>
         format /time="(?<external_dns.time>[^ ]*)" level=(?<external_dns.level>[^ ]*) msg="(? 
         <external_dns.msg>[^\"]*)"/
       </pattern>
       <pattern>
           format json
       </pattern>
   </parse>
 </filter>

I don't know if it's the right place for te types field but, I have try many places and nothing happened. All types stay in string, nothing is converted...

Do you have any advices? Thansk a lot. (Don't pay attention to the indention please)

nargmarg avatar May 28 '20 14:05 nargmarg

The same for me. I parse json escaped logs from ingress controller and all field are as a text type :/ Probably this plugin ignore the types keyword. @repeatedly Can you confirm/denied if it's possible to define types for fields like it is described in https://docs.fluentd.org/configuration/parse-section#parse-parameters

hetii avatar Nov 21 '22 12:11 hetii

I was wondering if @repeatedly is still active. Seen some stuff that would be quite useful (like open PRs) and answers to questions like this.

I know you can use time_format in multi_format. Now I want to know if I can take message with the same overall pattern (json), but with different names for time_key.

fabio-s-franco avatar Aug 04 '23 15:08 fabio-s-franco

I know you can use time_format in multi_format

Yes. This plugin forwards configurations and events to actual parser plugins, so parser features should be worked. I tested with simple configuration and it works as expected.

<source>
  @type sample
  sample {"hello":"world","log":"{\"key\":\"value\",\"event_time\":\"22/Feb/2022:12:00:00 +0900\",\"num\":\"100\"}"}
  tag sample
</source>

<filter sample>
  @type parser
  key_name log
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key event_time
      time_format %d/%b/%Y:%H:%M:%S %z
      types num:integer
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
</filter>

<match sample>
  @type stdout
</match>

with different names for time_key.

What does this mean? incoming events have different time key names like below?

{"k":"v1","time_key1":"time_value1"}
{"k":"v2","time_key2":"time_value2"}
{"k":"v3","time_key3":"time_value3"}
{"k":"v4","time_key2":"time_value3"}
...
{"k":"vN","time_key1":"time_valueN"}

repeatedly avatar Aug 11 '23 10:08 repeatedly

What does this mean? incoming events have different time key names like below?

Exactly @repeatedly. Been struggling with this for a while and it doesn't really seem to work. For example:

 <source>
        @type http
        bind 0.0.0.0
        port 5880

        <parse>
          @type multi_format
          <pattern>
            format json
            time_key Timestamp
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format json
            time_key @t
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format json
            time_key @timestamp
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
          <pattern>
            format json
            time_key timestamp
            keep_time_key false
            utc true
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </pattern>
        </parse>

        @label @HTTP
      </source>

My expectation was that it would take go through the parses until there is a match in the time_key. But in the end all different keys endup in the end object in elastic, and the timestamp is the time the event was emitted by fluentd.

This is as far as I got, and just put it aside for now. I find it very difficult to understand how fluentd treats time in general. If you have suggestions, I am all ears :)

Thanks for looking into it.

fabio-s-franco avatar Aug 21 '23 11:08 fabio-s-franco