fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Extend Embedded Ruby Code support for Hashes and Arrays

Open Athishpranav2003 opened this issue 1 year ago • 42 comments

Which issue(s) this PR fixes: Fixes #4385

What this PR does / why we need it: It processes the interpolated variables in Hashes and Arrays This is needed because string interpolation was not working because of # being escaped in the string(happens due to usage of single quotes) Docs Changes: NA Release Note: NA

Athishpranav2003 avatar Aug 06 '24 06:08 Athishpranav2003

@ashie please check this whenever you are free

Athishpranav2003 avatar Aug 06 '24 06:08 Athishpranav2003

but at the moment it doesn't seem like a good design decision to give types.rb the same functionality as LiteralParser.

I don't yet see the detail but I feel same smell.

ashie avatar Aug 06 '24 09:08 ashie

@daipom do you suggest that we should handle this scenario in literal parser.rb?

Athishpranav2003 avatar Aug 06 '24 09:08 Athishpranav2003

@daipom do you suggest that we should handle this scenario in literal parser.rb?

Yes. We should start by considering whether there are points in LiteralParser that can be improved.

daipom avatar Aug 06 '24 09:08 daipom

Oh ok let me look at it today and check if i can move this processing to it

Athishpranav2003 avatar Aug 06 '24 09:08 Athishpranav2003

@Athishpranav2003 Thanks so much!

daipom avatar Aug 06 '24 10:08 daipom

@daipom ignore the prev comments I have corrected it and pushed Also tested locally in my side. Please check it and let me know if there is any issue

Athishpranav2003 avatar Aug 06 '24 16:08 Athishpranav2003

@daipom not sure why 2 workflows alone failed, could you check that part alone?

Athishpranav2003 avatar Aug 07 '24 02:08 Athishpranav2003

@daipom not sure why 2 workflows alone failed, could you check that part alone?

No problem! They have nothing to do with this PR! Sorry, some tests are unstable and sometimes fail.

daipom avatar Aug 07 '24 03:08 daipom

Could you add unit test for this?

ashie avatar Aug 07 '24 04:08 ashie

@ashie added UTs for the same

Athishpranav2003 avatar Aug 07 '24 05:08 Athishpranav2003

@Athishpranav2003

@daipom ignore the prev comments I have corrected it and pushed Also tested locally in my side. Please check it and let me know if there is any issue

Thanks! Could you please clarify the specification change? The focus is on what to do with the specifications.

The current specification is:

  • String
    • non-quote: Simple one-line string without spaces.
    • single-quote: Allow spaces or multiple lines.
    • double-quote: Allow escaped characters or embedded codes.
  • JSON
    • The specification seems unclear to me, but looks like it is based on JSON.parse() and does not support escaped characters or embedded codes.

As far as seeing changes in the code, embedded codes will always be supported, right? If so, I have 2 concerns:

  • It will be impossible to write #{...} as a raw string in JSON.
  • Not backward compatible.

We need to consider whether the new specifications are acceptable while taking into account these concerns.

daipom avatar Aug 07 '24 06:08 daipom

Strings stay the same - only double quoted strings support string interpolation

For Hashes and Arrays - We do string interpolation for anything that has #{...} in it. What i can do is maybe restrict this to double quoted strings inside hashes but not sure if its needed. Please let me know on this alone

Tho we have one benefit that the docs present for now we didn't restrict the interpolation to strings so it wont go against the specifications from the past

Athishpranav2003 avatar Aug 07 '24 07:08 Athishpranav2003

What i can do is maybe restrict this to double quoted strings inside hashes but not sure if its needed. Please let me know on this alone

This is the difficult part. JSON uses double quotes for strings. So, we can't switch the enabled and disabled of embedded codes with the type of quates...

daipom avatar Aug 07 '24 07:08 daipom

Strings stay the same - only double quoted strings support string interpolation

For Hashes and Arrays - We do string interpolation for anything that has #{...} in it.

OK. I think this direction is possible for us! There are concerns(https://github.com/fluent/fluentd/pull/4580#issuecomment-2272700592):

* It will be impossible to write `#{...}` as a raw string in JSON.
* Not backward compatible.

but I think these points are less important for this feature. To write #{...} as is, we can use single-quote to the entire string, such as param '{"k","#{foo}"}'. So, there is a workaround.

We need to hear from other maintainers on this point.

daipom avatar Aug 07 '24 08:08 daipom

Fine We can wait for other's to put their opinions

Athishpranav2003 avatar Aug 07 '24 08:08 Athishpranav2003

The focus is on what extent we care about the possibility that existing users are using settings like:

<match test>
  @type http
  endpoint foo
  headers {"foo": "foo#{bar"}
</match>

When #{ exists, this change can break the existing setting.

daipom avatar Aug 07 '24 08:08 daipom

* It will be impossible to write `#{...}` as a raw string in JSON.
* Not backward compatible.

but I think these points are less important for this feature. To write #{...} as is, we can use single-quote to the entire string, such as param '{"k","#{foo}"}'. So, there is a workaround.

I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

diff --git a/test/config/test_config_parser.rb b/test/config/test_config_parser.rb
index fe29028e..b0c18fc9 100644
--- a/test/config/test_config_parser.rb
+++ b/test/config/test_config_parser.rb
@@ -483,8 +483,8 @@ module Fluent::Config
           param_size 4k
           param_bool true
           param_time 10m
-          param_hash { "key1": "value1", "key2": 2 }
-          param_array ["value1", "value2", 100]
+          param_hash '{ "key1": "value1", "key2": 2 }'
+          param_array '["value1", "value2", 100]'
           param_regexp /pattern/
         ])
         target = AllTypes.new.configure(conf)

ashie avatar Aug 08 '24 01:08 ashie

I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.

ashie avatar Aug 08 '24 01:08 ashie

We need to hear from other maintainers on this point.

In general, we intend to keep compatibility so that users can continue to use same major version without changing settings. When we have to change behaviour in any reason, it would be better to show announcements or warnings before making the change, and wait a few minor versions to make the change in actual.

On the other hand, using #{...} in fluentd's setting is expected to evaluate Ruby code in most cases and I guess using such syntax in other purpose seems very rare. If we investigate major plugins and it seems unlikely that such syntax will be used, we can made this change in next minor version (without adding any warning) I guess.

ashie avatar Aug 08 '24 02:08 ashie

I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.

It would be better adding a test for this.

ashie avatar Aug 08 '24 02:08 ashie

On the other hand, using #{...} in fluentd's setting is expected to evaluate Ruby code in most cases and I guess using such syntax in other purpose seems very rare. If we investigate major plugins and it seems unlikely that such syntax will be used, we can made this change in next minor version (without adding any warning) I guess.

BTW my current impression is very positive for including this change in the next minor release (with addressing some corner cases as @daipom mentioned). It seems it's much natural than before for users.

ashie avatar Aug 08 '24 02:08 ashie

I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.

It would be better adding a test for this.

A test to check whether its parsed as array/hash or string?

Athishpranav2003 avatar Aug 08 '24 05:08 Athishpranav2003

A test to check whether its parsed as array/hash or string?

A test that checks whether a single quoted string is correctly parsed as array/hash and #{...}isn't expanded (output as is) when the declared type is array/hash.

ashie avatar Aug 08 '24 05:08 ashie

@ashie i guess current code renders both " and ' quoted hashes the same :( . This is because we dont handle the case seperately since its optional for config to enclose hash/array in quotes. Not sure which is correct way to solve this

Athishpranav2003 avatar Aug 08 '24 05:08 Athishpranav2003

@ashie i guess current code renders both " and ' quoted hashes the same :( . This is because we dont handle the case seperately since its optional for config to enclose hash/array in quotes. Not sure which is correct way to solve this

When we use " or ' to an entire literal, that literal will be handled as String even if it is actually JSON(Array). LiteralParser does not necessarily parse JSON(Array) as JSON because it will be parsed as JSON later when the type is applied.

https://github.com/fluent/fluentd/blob/5618dd0e86bf0bc47a6bed2da50e37943ac2c8e9/lib/fluent/config/types.rb#L203-L207

daipom avatar Aug 08 '24 09:08 daipom

https://github.com/fluent/fluentd/blob/master/lib/fluent/config/literal_parser.rb#L67-L78 @daipom So here we are giving higher precedence to Arrays and JSON than strings. So even if we specify the quotes its gonna get analysed as JSON/Arrays. After the analysis it JSON.dumps the array which once converts it back to string which is then again handled in config/types.rb

Athishpranav2003 avatar Aug 08 '24 09:08 Athishpranav2003

https://github.com/fluent/fluentd/blob/master/lib/fluent/config/literal_parser.rb#L67-L78 @daipom So here we are giving higher precedence to Arrays and JSON than strings. So even if we specify the quotes its gonna get analysed as JSON/Arrays. After the analysis it JSON.dumps the array which once converts it back to string which is then again handled in config/types.rb

@Athishpranav2003 If the first character of the literal is [ or {, the literal will be parsed by scan_json(). On the other hand, if the first character is not [ or {, the literal will be parsed by scan_string().

https://github.com/fluent/fluentd/blob/5618dd0e86bf0bc47a6bed2da50e37943ac2c8e9/lib/fluent/config/literal_parser.rb#L70-L76

So, if we use quotes to the entire JSON literal, such as '{"foo":"bar"}' , it will be parsed by scan_string(). So, it will be a workaround to write #{...} as is.

daipom avatar Aug 09 '24 02:08 daipom

But isn't the code using skip to identify the starting { and [ ? Since it's using the skip the " at the start will anyways match?

Athishpranav2003 avatar Aug 09 '24 02:08 Athishpranav2003

@Athishpranav2003 Please try this:

diff --git a/test/config/test_literal_parser.rb b/test/config/test_literal_parser.rb
index c5b9b188..275e468e 100644
--- a/test/config/test_literal_parser.rb
+++ b/test/config/test_literal_parser.rb
@@ -319,6 +319,9 @@ EOA
         assert_text_parsed_as('{"a":"foo1"}', '{"a":"foo#{worker_id}"}')
         ENV.delete('SERVERENGINE_WORKER_ID')
       }
+      test('no quote') { assert_text_parsed_as_json({'a'=>'b','c'=>'test'}, '{"a":"b","c":"#{v1}"}') }
+      test('single quote') { assert_text_parsed_as_json({'a'=>'b','c'=>'#{v1}'}, '\'{"a":"b","c":"#{v1}"}\'') }
+      test('double quote') { assert_text_parsed_as_json({'a'=>'b','c'=>'test'}, '"{\"a\":\"b\",\"c\":\"#{v1}\"}"') }
       json_hash_with_comment = <<EOH
 {
  "a": 1, # this is a
$ bundle exec rake test TEST=test/config/test_literal_parser.rb

All tests will be passed. It means

  • test single quote: Embedded ruby code isn't evaludated, the string #{v1} is output as is.
  • test double quote: Embedded ruby code is evaludated, the string #{v1} is expanded as test.

ashie avatar Aug 09 '24 02:08 ashie