fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Max nesting level for json parser

Open bazzilio opened this issue 3 years ago • 9 comments

Is your feature request related to a problem? Please describe. I want to have option for the json parser plugin to limit nesting level for the parsing. My developers send huge metadata json, after parsing it "eats" elasticsearch fields.

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep metadata | awk -F. '{print NF}' | sort -n | wc -l 
733

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep context | awk -F. '{print NF}' | sort -n | wc -l 
218

But if i could limit nesting level for parsing, it would dramatically decreased fields count:

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep metadata | awk -F. 'NF>5 {print NF}' | sort -n | wc -l 
101

curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields'  | grep -E '^  "' 2>/dev/null | grep context | awk -F. 'NF>5 {print NF}' | sort -n | wc -l 
25

Describe the solution you'd like Set parameter to json parser section - max_nesting(int) So the parser would leave unparsed json after the nesting is reacher.

Describe alternatives you've considered

Additional context As i can see, parameter support with main json ruby libraries:

bazzilio avatar Mar 31 '21 16:03 bazzilio

One more question: is there a way to change DEFAULT_OJ_OPTIONS variable ? If i correct understang login in sources - looks like oj is the default parser. But as i see, for parse_io method fluentd uses yajl, so i am confused - which parser is using by default.

bazzilio avatar Mar 31 '21 16:03 bazzilio

is there a way to change DEFAULT_OJ_OPTIONS variable ?

It seems there is no way to do it (pull request is welcome :smile:)

If i correct understang login in sources - looks like oj is the default parser. But as i see, for parse_io method fluentd uses yajl, so i am confused - which parser is using by default.

It seems that oj is optional, it ensures to use oj if it's available but not required mandatory. On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.

https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/fluentd.gemspec#L23 https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/fluentd.gemspec#L52 https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/lib/fluent/plugin/parser_json.rb#L61-L71

In addition, there is the following description about yajl in the document of this plugin:

yajl: Mainly for stream parsing

ashie avatar Apr 02 '21 02:04 ashie

It seems that oj is optional, it ensures to use oj if it's installed but not required mandatory. On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.

However, it surely confusing. Because it's not documented, users can't understand such behavior. We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md

ashie avatar Apr 02 '21 02:04 ashie

We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md

https://github.com/fluent/fluentd-docs-gitbook/pull/298

ashie avatar Apr 05 '21 23:04 ashie

Fixed by #3315 You can use FLUENT_OJ_OPTION_MAX_NESTING for it.

ashie avatar Jul 14 '21 06:07 ashie

Now I've noticed that Oj.default_options doesn't accept :max_nesting: https://www.rubydoc.info/github/ohler55/oj/Oj.default_options

It's reported at https://app.slack.com/client/T0CSKNZLK/C0CTT63EE/thread/C0CTT63EE-1631532462.067500

We should consider other way to apply it.

ashie avatar Sep 21 '21 08:09 ashie

Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?

vishalmamidi1 avatar Feb 14 '22 21:02 vishalmamidi1

Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?

Yes, it doesn't work. Because now I notice that Oj.default_options doesn't support it, I'll remove it. Instead, I'm considering to add max_nesting parameter to parser_json.

ashie avatar Mar 01 '22 02:03 ashie

The implementation of Oj:

  • https://github.com/ohler55/oj/blob/e2c0fbde9cf13e149ae6b16d0e83ce23f47bb256/ext/oj/oj.c#L171-L233
  • https://github.com/ohler55/oj/blob/e2c0fbde9cf13e149ae6b16d0e83ce23f47bb256/ext/oj/oj.c#L607-L955

max_nesting isn't supported by Oj.default_options.

ashie avatar Oct 27 '22 05:10 ashie