fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Make sure JSON parser returns Hash

Open daipom opened this issue 2 years ago • 6 comments

Which issue(s) this PR fixes: Fixes #4100

What this PR does / why we need it: Make sure JSON parser returns Hash. It is wrong for Parser to return a record that is not Hash. Subsequent processing may result in errors.

The result of parsing JSON can be a value other than Hash. Originally, the data represented by JSON was not limited to Hash.

  • https://www.rfc-editor.org/rfc/rfc8259#section-2

The current implementation of JSONParser does not appear to have sufficiently considered this.

We need to consider carefully the specification of the case where raw data can not be parsed as Hash.

The currently intended usecases appears to be Hash or Array of Hash. We should consider only these 2 patterns as parsable for now.

in_http assumed that Parser could return Array. Therefore, it needed to be fixed along with this.

MessagePackParser appears to have the same issue. (Not fixed in this PR.)

Docs Changes: Not needed.

Release Note: Same as the title.

Difference

Case: No subsequent processing

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
  </parse>
</source>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result before this fix

2023-03-21 23:30:30.623063804 +0900 test.tcp: [{"k":"v"},{"k2":"v2"}]

Result after this fix

2023-03-21 23:29:20.147728222 +0900 test.tcp: {"k":"v"}
2023-03-21 23:29:20.147754433 +0900 test.tcp: {"k2":"v2"}

Case: With subsequent filter processing

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
  </parse>
</source>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result before this fix

2023-03-21 23:30:51 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.tcp" time=2023-03-21 23:30:51.040692881 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]

Result after this fix

2023-03-21 23:29:55.783183244 +0900 test.tcp: {"k":"v","class":"Hash"}
2023-03-21 23:29:55.783215076 +0900 test.tcp: {"k2":"v2","class":"Hash"}

daipom avatar Mar 21 '23 02:03 daipom

Additional confirming of the difference of the behavior,

<source>
  @type sample
  tag test.hash
  sample {"message": "{\"k\":\"v\"}"}
</source>

<source>
  @type sample
  tag test.string
  sample {"message": "\"HELLO\""}
</source>

<source>
  @type sample
  tag test.invalid
  sample {"message": "HELLO"}
</source>

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Before this fix

Some patterns result in errors in subsequent processing.

2023-03-21 23:50:24.068859931 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2023-03-21 23:50:25.070141860 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'HELLO'" location=nil tag="test.invalid" time=2023-03-21 23:50:25.071323231 +0900 record={"message"=>"HELLO"}
2023-03-21 23:50:25 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for \"HELLO\":String" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.string" time=2023-03-21 23:50:25.072303009 +0900 record="HELLO"

After this fix

Correctly, ParserError occurs in filter_parser.

I left a comment in the code about filter_parser not being able to parse multiple records from one raw record. (This is a limitation of the current specification.)

2023-03-21 23:43:25.061681273 +0900 test.array: {"k":"v","class":"Hash"}
2023-03-21 23:43:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'HELLO'" location=nil tag="test.invalid" time=2023-03-21 23:43:25.062907066 +0900 record={"message"=>"HELLO"}
2023-03-21 23:43:25.063493027 +0900 test.hash: {"k":"v","class":"Hash"}
2023-03-21 23:43:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '\"HELLO\"'" location=nil tag="test.string" time=2023-03-21 23:43:25.063894687 +0900 record={"message"=>"\"HELLO\""}

daipom avatar Mar 21 '23 14:03 daipom

Sorry for the timing of the v1.16.0 release.

I understand that we would not be able to put this fix in v1.16.0.

daipom avatar Mar 21 '23 15:03 daipom

I confirmed the failing tests. They seem to be unstable tests and not related to this fix.

daipom avatar Mar 24 '23 07:03 daipom

TODO: Fix other parsers.

daipom avatar Apr 10 '23 09:04 daipom

Sorry for the delay. I'm going to fix this for the next release.

daipom avatar Sep 12 '23 08:09 daipom

I have add new PR for this

  • #4474

daipom avatar Apr 26 '24 10:04 daipom