fluentd
fluentd copied to clipboard
Extend Embedded Ruby Code support for Hashes and Arrays
Which issue(s) this PR fixes: Fixes #4385
What this PR does / why we need it:
It processes the interpolated variables in Hashes and Arrays
This is needed because string interpolation was not working because of # being escaped in the string(happens due to usage of single quotes)
Docs Changes:
NA
Release Note:
NA
@ashie please check this whenever you are free
but at the moment it doesn't seem like a good design decision to give
types.rbthe same functionality asLiteralParser.
I don't yet see the detail but I feel same smell.
@daipom do you suggest that we should handle this scenario in literal parser.rb?
@daipom do you suggest that we should handle this scenario in
literal parser.rb?
Yes. We should start by considering whether there are points in LiteralParser that can be improved.
Oh ok let me look at it today and check if i can move this processing to it
@Athishpranav2003 Thanks so much!
@daipom ignore the prev comments I have corrected it and pushed Also tested locally in my side. Please check it and let me know if there is any issue
@daipom not sure why 2 workflows alone failed, could you check that part alone?
@daipom not sure why 2 workflows alone failed, could you check that part alone?
No problem! They have nothing to do with this PR! Sorry, some tests are unstable and sometimes fail.
Could you add unit test for this?
@ashie added UTs for the same
@Athishpranav2003
@daipom ignore the prev comments I have corrected it and pushed Also tested locally in my side. Please check it and let me know if there is any issue
Thanks! Could you please clarify the specification change? The focus is on what to do with the specifications.
The current specification is:
- String
- non-quote: Simple one-line string without spaces.
- single-quote: Allow spaces or multiple lines.
- double-quote: Allow escaped characters or embedded codes.
- JSON
- The specification seems unclear to me, but looks like it is based on
JSON.parse()and does not support escaped characters or embedded codes.
- The specification seems unclear to me, but looks like it is based on
As far as seeing changes in the code, embedded codes will always be supported, right? If so, I have 2 concerns:
- It will be impossible to write
#{...}as a raw string in JSON. - Not backward compatible.
We need to consider whether the new specifications are acceptable while taking into account these concerns.
Strings stay the same - only double quoted strings support string interpolation
For Hashes and Arrays - We do string interpolation for anything that has #{...} in it. What i can do is maybe restrict this to double quoted strings inside hashes but not sure if its needed. Please let me know on this alone
Tho we have one benefit that the docs present for now we didn't restrict the interpolation to strings so it wont go against the specifications from the past
What i can do is maybe restrict this to double quoted strings inside hashes but not sure if its needed. Please let me know on this alone
This is the difficult part. JSON uses double quotes for strings. So, we can't switch the enabled and disabled of embedded codes with the type of quates...
Strings stay the same - only double quoted strings support string interpolation
For Hashes and Arrays - We do string interpolation for anything that has
#{...}in it.
OK. I think this direction is possible for us! There are concerns(https://github.com/fluent/fluentd/pull/4580#issuecomment-2272700592):
* It will be impossible to write `#{...}` as a raw string in JSON. * Not backward compatible.
but I think these points are less important for this feature.
To write #{...} as is, we can use single-quote to the entire string, such as param '{"k","#{foo}"}'.
So, there is a workaround.
We need to hear from other maintainers on this point.
Fine We can wait for other's to put their opinions
The focus is on what extent we care about the possibility that existing users are using settings like:
<match test>
@type http
endpoint foo
headers {"foo": "foo#{bar"}
</match>
When #{ exists, this change can break the existing setting.
* It will be impossible to write `#{...}` as a raw string in JSON. * Not backward compatible.but I think these points are less important for this feature. To write
#{...}as is, we can use single-quote to the entire string, such asparam '{"k","#{foo}"}'. So, there is a workaround.
I'm not sure but it seems that such workaround isn't available for Array & Hash types.
For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.
diff --git a/test/config/test_config_parser.rb b/test/config/test_config_parser.rb
index fe29028e..b0c18fc9 100644
--- a/test/config/test_config_parser.rb
+++ b/test/config/test_config_parser.rb
@@ -483,8 +483,8 @@ module Fluent::Config
param_size 4k
param_bool true
param_time 10m
- param_hash { "key1": "value1", "key2": 2 }
- param_array ["value1", "value2", 100]
+ param_hash '{ "key1": "value1", "key2": 2 }'
+ param_array '["value1", "value2", 100]'
param_regexp /pattern/
])
target = AllTypes.new.configure(conf)
I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in
config_param.
Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.
We need to hear from other maintainers on this point.
In general, we intend to keep compatibility so that users can continue to use same major version without changing settings. When we have to change behaviour in any reason, it would be better to show announcements or warnings before making the change, and wait a few minor versions to make the change in actual.
On the other hand, using #{...} in fluentd's setting is expected to evaluate Ruby code in most cases and I guess using such syntax in other purpose seems very rare. If we investigate major plugins and it seems unlikely that such syntax will be used, we can made this change in next minor version (without adding any warning) I guess.
I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in
config_param.Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.
It would be better adding a test for this.
On the other hand, using
#{...}in fluentd's setting is expected to evaluate Ruby code in most cases and I guess using such syntax in other purpose seems very rare. If we investigate major plugins and it seems unlikely that such syntax will be used, we can made this change in next minor version (without adding any warning) I guess.
BTW my current impression is very positive for including this change in the next minor release (with addressing some corner cases as @daipom mentioned). It seems it's much natural than before for users.
I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in
config_param.Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.
It would be better adding a test for this.
A test to check whether its parsed as array/hash or string?
A test to check whether its parsed as array/hash or string?
A test that checks whether a single quoted string is correctly parsed as array/hash and #{...}isn't expanded (output as is) when the declared type is array/hash.
@ashie i guess current code renders both " and ' quoted hashes the same :( . This is because we dont handle the case seperately since its optional for config to enclose hash/array in quotes. Not sure which is correct way to solve this
@ashie i guess current code renders both " and ' quoted hashes the same :( . This is because we dont handle the case seperately since its optional for config to enclose hash/array in quotes. Not sure which is correct way to solve this
When we use " or ' to an entire literal, that literal will be handled as String even if it is actually JSON(Array).
LiteralParser does not necessarily parse JSON(Array) as JSON because it will be parsed as JSON later when the type is applied.
https://github.com/fluent/fluentd/blob/5618dd0e86bf0bc47a6bed2da50e37943ac2c8e9/lib/fluent/config/types.rb#L203-L207
https://github.com/fluent/fluentd/blob/master/lib/fluent/config/literal_parser.rb#L67-L78
@daipom
So here we are giving higher precedence to Arrays and JSON than strings. So even if we specify the quotes its gonna get analysed as JSON/Arrays.
After the analysis it JSON.dumps the array which once converts it back to string which is then again handled in config/types.rb
https://github.com/fluent/fluentd/blob/master/lib/fluent/config/literal_parser.rb#L67-L78 @daipom So here we are giving higher precedence to Arrays and JSON than strings. So even if we specify the quotes its gonna get analysed as JSON/Arrays. After the analysis it
JSON.dumpsthe array which once converts it back to string which is then again handled inconfig/types.rb
@Athishpranav2003
If the first character of the literal is [ or {, the literal will be parsed by scan_json().
On the other hand, if the first character is not [ or {, the literal will be parsed by scan_string().
https://github.com/fluent/fluentd/blob/5618dd0e86bf0bc47a6bed2da50e37943ac2c8e9/lib/fluent/config/literal_parser.rb#L70-L76
So, if we use quotes to the entire JSON literal, such as '{"foo":"bar"}' , it will be parsed by scan_string().
So, it will be a workaround to write #{...} as is.
But isn't the code using skip to identify the starting { and [ ? Since it's using the skip the " at the start will anyways match?
@Athishpranav2003 Please try this:
diff --git a/test/config/test_literal_parser.rb b/test/config/test_literal_parser.rb
index c5b9b188..275e468e 100644
--- a/test/config/test_literal_parser.rb
+++ b/test/config/test_literal_parser.rb
@@ -319,6 +319,9 @@ EOA
assert_text_parsed_as('{"a":"foo1"}', '{"a":"foo#{worker_id}"}')
ENV.delete('SERVERENGINE_WORKER_ID')
}
+ test('no quote') { assert_text_parsed_as_json({'a'=>'b','c'=>'test'}, '{"a":"b","c":"#{v1}"}') }
+ test('single quote') { assert_text_parsed_as_json({'a'=>'b','c'=>'#{v1}'}, '\'{"a":"b","c":"#{v1}"}\'') }
+ test('double quote') { assert_text_parsed_as_json({'a'=>'b','c'=>'test'}, '"{\"a\":\"b\",\"c\":\"#{v1}\"}"') }
json_hash_with_comment = <<EOH
{
"a": 1, # this is a
$ bundle exec rake test TEST=test/config/test_literal_parser.rb
All tests will be passed. It means
- test single quote: Embedded ruby code isn't evaludated, the string
#{v1}is output as is. - test double quote: Embedded ruby code is evaludated, the string
#{v1}is expanded astest.