Make sure JSON parser returns Hash
Which issue(s) this PR fixes: Fixes #4100
What this PR does / why we need it:
Make sure JSON parser returns Hash.
It is wrong for Parser to return a record that is not Hash.
Subsequent processing may result in errors.
The result of parsing JSON can be a value other than Hash. Originally, the data represented by JSON was not limited to Hash.
- https://www.rfc-editor.org/rfc/rfc8259#section-2
The current implementation of JSONParser does not appear to have sufficiently considered this.
We need to consider carefully the specification of the case where raw data can not be parsed as Hash.
The currently intended usecases appears to be Hash or Array of Hash.
We should consider only these 2 patterns as parsable for now.
in_http assumed that Parser could return Array.
Therefore, it needed to be fixed along with this.
MessagePackParser appears to have the same issue. (Not fixed in this PR.)
Docs Changes: Not needed.
Release Note: Same as the title.
Difference
Case: No subsequent processing
Config
<source>
@type tcp
tag test.tcp
<parse>
@type json
</parse>
</source>
<match test.**>
@type stdout
</match>
Operation
$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]
Result before this fix
2023-03-21 23:30:30.623063804 +0900 test.tcp: [{"k":"v"},{"k2":"v2"}]
Result after this fix
2023-03-21 23:29:20.147728222 +0900 test.tcp: {"k":"v"}
2023-03-21 23:29:20.147754433 +0900 test.tcp: {"k2":"v2"}
Case: With subsequent filter processing
Config
<source>
@type tcp
tag test.tcp
<parse>
@type json
</parse>
</source>
<filter test.**>
@type record_transformer
enable_ruby true
<record>
class ${record.class}
</record>
</filter>
<match test.**>
@type stdout
</match>
Operation
$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]
Result before this fix
2023-03-21 23:30:51 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.tcp" time=2023-03-21 23:30:51.040692881 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
Result after this fix
2023-03-21 23:29:55.783183244 +0900 test.tcp: {"k":"v","class":"Hash"}
2023-03-21 23:29:55.783215076 +0900 test.tcp: {"k2":"v2","class":"Hash"}
Additional confirming of the difference of the behavior,
<source>
@type sample
tag test.hash
sample {"message": "{\"k\":\"v\"}"}
</source>
<source>
@type sample
tag test.string
sample {"message": "\"HELLO\""}
</source>
<source>
@type sample
tag test.invalid
sample {"message": "HELLO"}
</source>
<source>
@type sample
tag test.array
sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>
<filter test.**>
@type parser
key_name message
<parse>
@type json
</parse>
</filter>
<filter test.**>
@type record_transformer
enable_ruby true
<record>
class ${record.class}
</record>
</filter>
<match test.**>
@type stdout
</match>
Before this fix
Some patterns result in errors in subsequent processing.
2023-03-21 23:50:24.068859931 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2023-03-21 23:50:25.070141860 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'HELLO'" location=nil tag="test.invalid" time=2023-03-21 23:50:25.071323231 +0900 record={"message"=>"HELLO"}
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.string" time=2023-03-21 23:50:25.072303009 +0900 record="HELLO"
After this fix
Correctly, ParserError occurs in filter_parser.
I left a comment in the code about filter_parser not being able to parse multiple records from one raw record.
(This is a limitation of the current specification.)
2023-03-21 23:43:25.061681273 +0900 test.array: {"k":"v","class":"Hash"}
2023-03-21 23:43:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'HELLO'" location=nil tag="test.invalid" time=2023-03-21 23:43:25.062907066 +0900 record={"message"=>"HELLO"}
2023-03-21 23:43:25.063493027 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 23:43:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '\"HELLO\"'" location=nil tag="test.string" time=2023-03-21 23:43:25.063894687 +0900 record={"message"=>"\"HELLO\""}
Sorry for the timing of the v1.16.0 release.
I understand that we would not be able to put this fix in v1.16.0.
I confirmed the failing tests. They seem to be unstable tests and not related to this fix.
TODO: Fix other parsers.
Sorry for the delay. I'm going to fix this for the next release.
I have add new PR for this
- #4474