fluent-plugin-multi-format-parser
fluent-plugin-multi-format-parser copied to clipboard
Use time_format and types on multi_format parse
I currently parse one tag with a simple filter but I need to add a different pattern to the filter and I'm planning to migrate to a multi_format plugin.
My question is if it's possible to use the time_format
and types
fields on each the pattern like this:
<filter kubernetes.var.log.containers.traefik-ingress-**.log>
@type parser
key_name log
reserve_data yes
<parse>
@type multi_format
<pattern>
format regexp
expression /^(?<ip>[^-]*) - - \[(?<datetime>[^\]]*)\] "(?<method>[^ ]*) (?<path>[^ ]*) (?<http_version>[^"]*)" (?<status_code>[^ ]*) (?<body_bytes>[^ ]*) "(?<referer>[^"]*)" "(?<user_agent>[^"]*)" (?<seq>[^ ]*) "(?<domain>[^ ]*)" "(?<dest_url>[^ ]*)" (?<response_time>[^ ms]*)/
time_key datetime
time_format %d/%b/%Y:%H:%M:%S %z
types status_code:integer,body_bytes:integer,seq:integer,response_time:integer
</pattern>
<pattern>
format regexp
expression /^\[(?<datetime>[^\]]*)\] - (?<data>[^ ]*)/
</pattern>
</parse>
</filter>
Probably not cause I get this when I used your example:
2019-02-15 09:35:16 +0000 [warn]: dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not match with data 'time=\"2019-02-15T09:35:16Z\" level=debug msg=\"Skipping Kubernetes event kind *v1.Endpoints\"\n'" location=nil tag="kubernetes.var.log.containers.traefik-ingress-controller-d2bv5_kube-system_traefik-ingress-lb-3f5d86da4a51d9909d67acae3fa9c8ddf1ba9f83f6b4ece55728a53b41175bae.log" time=2019-02-15 09:35:16.418231659 +0000 record={"log"=>"time=\"2019-02-15T09:35:16Z\" level=debug msg=\"Skipping Kubernetes event kind *v1.Endpoints\"\n", "stream"=>"stdout", "docker"=>{"container_id"=>"3f5d86da4a51d9909d67acae3fa9c8ddf1ba9f83f6b4ece55728a53b41175bae"}, "kubernetes"=>{"container_name"=>"traefik-ingress-lb", "namespace_name"=>"kube-system", "pod_name"=>"traefik-ingress-controller-d2bv5", "container_image"=>"traefik:latest", "container_image_id"=>"docker-pullable://traefik@sha256:79a9b27986068895c5deb438099fbd3072ed645cdcabc72af24e229f868c4cf2", "pod_id"=>"101009d3-2c79-11e9-b433-de1a34070007", "labels"=>{"controller-revision-hash"=>"696f6f7df", "k8s-app"=>"traefik-ingress-lb", "name"=>"traefik-ingress-lb", "pod-template-generation"=>"1"}, "host"=>"kube-master1", "master_url"=>"https://10.96.0.1:443/api", "namespace_id"=>"9e8bebb0-11a2-11e9-9ca2-de1a34070007"}}
So please, is someone knows where we should add the types
field because I've tried multiple places and nothing happened : all my field stay on string types :-/
@nargmarg If you have a problem, need to write configuration and actual log example. We are not psychic, so hard to reply the answer without the information :)
You're right, sorry. Below you will find one part of my fluentd config (no outut part but no need to share that) :
<filter kubernetes.**>
@id filter_parser
@type parser
key_name log
reserve_time true
reserve_data true
remove_key_name_field true
<parse>
@type multi_format
<pattern>
format /(?<web.remote_addr>[^ ]*) - - \[(?<web.time>[^\]]*)\] "(?<web.method>\S+)(?: +(?
<web.request>[^\"]*) +\S*)?" (?<web.code>[^ ]*) (?<web.size>[^ ]*) "(?<web.referer>[^\"]*)" "(?
<web.agent>[^\"]*)" (?<web.request_length>[^ ]*) (?<web.request_time>[^ ]*) \[(?
<web.namespace_service_port>[^ ]*)\] (?<web.upstream_addr>[^ ]*) (?
<web.upstream_response_length>[^ ]*) (?<web.upstream_response_time>[^ ]*) (?
<web.upstream_status>[^ ]*)/
types web.code:integer,web.size:integer,web.request_length:integer,
web.request_time:float,web.upstream_addr:array,web.upstream_response_length:integer,
web.upstream_response_time:float,web.upstream_status:integer
</pattern>
<pattern>
format /time="(?<external_dns.time>[^ ]*)" level=(?<external_dns.level>[^ ]*) msg="(?
<external_dns.msg>[^\"]*)"/
</pattern>
<pattern>
format json
</pattern>
</parse>
</filter>
I don't know if it's the right place for te types
field but, I have try many places and nothing happened. All types stay in string, nothing is converted...
Do you have any advices? Thansk a lot. (Don't pay attention to the indention please)
The same for me. I parse json escaped logs from ingress controller and all field are as a text type :/ Probably this plugin ignore the types keyword. @repeatedly Can you confirm/denied if it's possible to define types for fields like it is described in https://docs.fluentd.org/configuration/parse-section#parse-parameters
I was wondering if @repeatedly is still active. Seen some stuff that would be quite useful (like open PRs) and answers to questions like this.
I know you can use time_format in multi_format. Now I want to know if I can take message with the same overall pattern (json), but with different names for time_key.
I know you can use time_format in multi_format
Yes. This plugin forwards configurations and events to actual parser plugins, so parser features should be worked. I tested with simple configuration and it works as expected.
<source>
@type sample
sample {"hello":"world","log":"{\"key\":\"value\",\"event_time\":\"22/Feb/2022:12:00:00 +0900\",\"num\":\"100\"}"}
tag sample
</source>
<filter sample>
@type parser
key_name log
<parse>
@type multi_format
<pattern>
format json
time_key event_time
time_format %d/%b/%Y:%H:%M:%S %z
types num:integer
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
<match sample>
@type stdout
</match>
with different names for time_key.
What does this mean? incoming events have different time key names like below?
{"k":"v1","time_key1":"time_value1"}
{"k":"v2","time_key2":"time_value2"}
{"k":"v3","time_key3":"time_value3"}
{"k":"v4","time_key2":"time_value3"}
...
{"k":"vN","time_key1":"time_valueN"}
What does this mean? incoming events have different time key names like below?
Exactly @repeatedly. Been struggling with this for a while and it doesn't really seem to work. For example:
<source>
@type http
bind 0.0.0.0
port 5880
<parse>
@type multi_format
<pattern>
format json
time_key Timestamp
keep_time_key false
utc true
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format json
time_key @t
keep_time_key false
utc true
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format json
time_key @timestamp
keep_time_key false
utc true
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format json
time_key timestamp
keep_time_key false
utc true
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
</parse>
@label @HTTP
</source>
My expectation was that it would take go through the parses until there is a match in the time_key. But in the end all different keys endup in the end object in elastic, and the timestamp is the time the event was emitted by fluentd.
This is as far as I got, and just put it aside for now. I find it very difficult to understand how fluentd treats time in general. If you have suggestions, I am all ears :)
Thanks for looking into it.