fluentd
fluentd copied to clipboard
Max nesting level for json parser
Is your feature request related to a problem? Please describe. I want to have option for the json parser plugin to limit nesting level for the parsing. My developers send huge metadata json, after parsing it "eats" elasticsearch fields.
curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields' | grep -E '^ "' 2>/dev/null | grep metadata | awk -F. '{print NF}' | sort -n | wc -l
733
curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields' | grep -E '^ "' 2>/dev/null | grep context | awk -F. '{print NF}' | sort -n | wc -l
218
But if i could limit nesting level for parsing, it would dramatically decreased fields count:
curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields' | grep -E '^ "' 2>/dev/null | grep metadata | awk -F. 'NF>5 {print NF}' | sort -n | wc -l
101
curl --progress-bar "http://127.0.0.1:9200/index-logs1/_field_caps?fields=*" | jq '.fields' | grep -E '^ "' 2>/dev/null | grep context | awk -F. 'NF>5 {print NF}' | sort -n | wc -l
25
Describe the solution you'd like
Set parameter to json parser section - max_nesting(int)
So the parser would leave unparsed json after the nesting is reacher.
Describe alternatives you've considered
Additional context As i can see, parameter support with main json ruby libraries:
One more question:
is there a way to change DEFAULT_OJ_OPTIONS variable ?
If i correct understang login in sources - looks like oj
is the default parser.
But as i see, for parse_io method fluentd uses yajl
, so i am confused - which parser is using by default.
is there a way to change DEFAULT_OJ_OPTIONS variable ?
It seems there is no way to do it (pull request is welcome :smile:)
If i correct understang login in sources - looks like
oj
is the default parser. But as i see, for parse_io method fluentd usesyajl
, so i am confused - which parser is using by default.
It seems that oj is optional, it ensures to use oj if it's available but not required mandatory. On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.
https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/fluentd.gemspec#L23 https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/fluentd.gemspec#L52 https://github.com/fluent/fluentd/blob/6a2852ab9ac1158ee1982220f77b967b3ede82c1/lib/fluent/plugin/parser_json.rb#L61-L71
In addition, there is the following description about yajl in the document of this plugin:
yajl: Mainly for stream parsing
It seems that oj is optional, it ensures to use oj if it's installed but not required mandatory. On the other hand yajl is madatory required. If oj isn't installed, fall back to yajl.
However, it surely confusing. Because it's not documented, users can't understand such behavior. We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md
We should update the document: https://github.com/fluent/fluentd-docs-gitbook/blob/1.0/parser/json.md
https://github.com/fluent/fluentd-docs-gitbook/pull/298
Fixed by #3315
You can use FLUENT_OJ_OPTION_MAX_NESTING
for it.
Now I've noticed that Oj.default_options
doesn't accept :max_nesting
: https://www.rubydoc.info/github/ohler55/oj/Oj.default_options
It's reported at https://app.slack.com/client/T0CSKNZLK/C0CTT63EE/thread/C0CTT63EE-1631532462.067500
We should consider other way to apply it.
Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?
Does FLUENT_OJ_OPTION_MAX_NESTING still doesn't work?
Yes, it doesn't work. Because now I notice that Oj.default_options
doesn't support it, I'll remove it.
Instead, I'm considering to add max_nesting
parameter to parser_json.
The implementation of Oj:
- https://github.com/ohler55/oj/blob/e2c0fbde9cf13e149ae6b16d0e83ce23f47bb256/ext/oj/oj.c#L171-L233
- https://github.com/ohler55/oj/blob/e2c0fbde9cf13e149ae6b16d0e83ce23f47bb256/ext/oj/oj.c#L607-L955
max_nesting
isn't supported by Oj.default_options
.