fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

parser: regex: Do not skip empty regex group matches

Open nigels-com opened this issue 5 years ago • 7 comments

Regular Expression Parser is skipping empty values #1486

Unlike the other parses, empty regex groups are omitted from the output.

Sample setup:

$ cat sample.in 
{"log": "{\"time_local\":\"2019-07-31T21:17:15\",\"client_ip\":\"\"}"}

$ cat sample.conf 
[SERVICE]
    Flush                     5
    Parsers_File              parsers.conf

[INPUT]
    Name         stdin

[FILTER]
    Name         parser
    Parser       json_regex
    Match        *
    Key_Name     log
    Reserve_Data On
    Preserve_Key On

[OUTPUT]
    Name            stdout
    Format          json_lines

$ cat parsers.conf 
[PARSER]
    Name   json_regex
    Format regex
    Regex  ^{"time_local":"(?<time_local>.*?)","client_ip":"(?<client_ip>.*?)"}$

Output with this patch applied:

$ cat sample.in | bin/fluent-bit -c sample.conf -p parsers.conf 
Fluent Bit v1.4.0
Copyright (C) Treasure Data

[2020/01/27 10:21:55] [ info] [storage] initializing...
[2020/01/27 10:21:55] [ info] [storage] in-memory
[2020/01/27 10:21:55] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/01/27 10:21:55] [ info] [engine] started (pid=8468)
[2020/01/27 10:21:55] [ info] [sp] stream processor started
[2020/01/27 10:21:55] [ warn] [in_stdin] end of file (stdin closed by remote end)
[2020/01/27 10:21:55] [ info] [input] pausing stdin.0
{"date":1580084515.652593,"time_local":"2019-07-31T21:17:15","client_ip":"","log":"{\"time_local\":\"2019-07-31T21:17:15\",\"client_ip\":\"\"}"}
[2020/01/27 10:21:55] [ warn] [engine] service will stop in 5 seconds
[2020/01/27 10:21:59] [ info] [engine] service stopped

Without this change the "client_ip":"" would be missing from the output.

nigels-com avatar Jan 27 '20 00:01 nigels-com

I think a hazard of this change is that we can't tell which groups are empty versus omitted.

For example:

$ cat parsers2.conf 
[PARSER]
    Name   json_regex
    Format regex
    Regex  ^{"time_local":"(?<time_local>.*?)"(,"client_ip":"(?<client_ip>.*?)")?}$

$ cat sample2.in 
{"log": "{\"time_local\":\"2019-07-31T21:17:15\",\"client_ip\":\"\"}"}
{"log": "{\"time_local\":\"2019-07-31T21:17:15\"}"}

$ cat sample2.in | bin/fluent-bit -c sample2.conf -p parsers2.conf 
Fluent Bit v1.4.0
Copyright (C) Treasure Data

[2020/01/27 10:31:24] [ info] [storage] initializing...
[2020/01/27 10:31:24] [ info] [storage] in-memory
[2020/01/27 10:31:24] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/01/27 10:31:24] [ info] [engine] started (pid=10386)
[2020/01/27 10:31:24] [ info] [sp] stream processor started
[2020/01/27 10:31:24] [ warn] [in_stdin] end of file (stdin closed by remote end)
[2020/01/27 10:31:24] [ info] [input] pausing stdin.0
{"date":1580085084.179838,"time_local":"2019-07-31T21:17:15","client_ip":"","log":"{\"time_local\":\"2019-07-31T21:17:15\",\"client_ip\":\"\"}"}
{"date":1580085084.179842,"time_local":"2019-07-31T21:17:15","client_ip":"","log":"{\"time_local\":\"2019-07-31T21:17:15\"}"}
[2020/01/27 10:31:24] [ warn] [engine] service will stop in 5 seconds
[2020/01/27 10:31:28] [ info] [engine] service stopped

nigels-com avatar Jan 27 '20 00:01 nigels-com

hmmm I suggest to introduce a new configuration property to the parsers called Skip_Empty_Keys set to true by default. So your patch can work if the property is set to false. On that way, we won't break other deloyments.

edsiper avatar May 05 '20 23:05 edsiper

ping

edsiper avatar Jun 30 '20 18:06 edsiper

Oh, thanks for the ping. Had completely forgotten about this one.

nigels-com avatar Jun 30 '20 22:06 nigels-com

Updated the PR with Skip_Empty_Keys configuration property. Will go ahead and do a documentation update also.

nigels-com avatar Sep 18 '20 05:09 nigels-com

@nigels-com

  • pls fix conflicts
  • add DCO

edsiper avatar Dec 13 '20 19:12 edsiper

@nigels-com How about this PR ? If you forget this one, is it OK that I will create another PR in the same way ?

nokute78 avatar Mar 12 '21 23:03 nokute78