rack
rack copied to clipboard
Nested parameters parsing error in rack 3.0.8
Problem description
We are planning an upgrade to Rails 7.1.1. The upgrade involves updating the rack version to the current one (3.0.8) However, some of the ours forms that use POST request stopped working. Investigating this issue led to the fact that the parsing of parameters passed by the form to rack is not done correctly. When using rack version 2.2.8 everything works fine. The error occurs when using field names of the field_name[index] type in the form, which is used to work with JSON arrays. A test example to reproduce the error.
Reproduce error
>> Rack::RELEASE
=> "2.2.8"
>> Rack::VERSION
=> [1, 3]
>> qs = 'authenticity_token=cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg&commit=%D0%A1%D0%BE%D0%B7%D0%B4%D0%B0%D1%82%D1%8C+%D0%B2%D0%B5%D1%80%D1%81%D0%B8%D1%8E+%D1%81%D1%82%D0%B0%D1%82%D1%8C%D0%B8&article_version%5Btitle%5D=&article_version%5Barticle_code_prefix%5D=ZDR_&article_version%5Bclassifier_ids%5D%5B%5D=&article_version%5Bquestionnaire_id%5D=&article_version%5Brecord_type_id%5D=&article_version%5Btag_plus%5D=0&article_version%5Btag_plus%5D=1&article_version%5Bcomment%5D=&article_version%5Bbody%5Banswer1%5D%5D=&article_version%5Bvirtual_linked_articles_codes%5D%5B%5D=&article_version%5Bdouble_linked_codes%5D%5B%5D=&article_version%5Bquestions%5D%5B%5D='
=> "authenticity_token=cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg&commit=%D0%A1%D0%BE%D0%B7%...
>>Test_parser = Class.new(Rack::QueryParser::Params)
=> Test_parser
>> p = Rack::QueryParser.new(Test_parser,10000,10000)
=> #<Rack::QueryParser:0x000000010c2f8818 @key_space_limit=10000, @param_depth_limit=10000, @params_class=Test_parser>
>> p.parse_nested_query(qs)
=>
{"authenticity_token"=>"cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg",
"commit"=>"Создать версию статьи",
"article_version"=>
{"title"=>"",
"article_code_prefix"=>"ZDR_",
"classifier_ids"=>[""],
"questionnaire_id"=>"",
"record_type_id"=>"",
"tag_plus"=>"1",
"comment"=>"",
"body"=>{"answer1"=>""},
"virtual_linked_articles_codes"=>[""],
"double_linked_codes"=>[""],
"questions"=>[""]}}
>>
The same example for rack version 3.0.8
>> Rack::RELEASE
=> "3.0.8"
>> Rack::VERSION
=> [1, 3]
>> qs = 'authenticity_token=cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg&commit=%D0%A1%D0%BE%D0%B7
%D0%B4%D0%B0%D1%82%D1%8C+%D0%B2%D0%B5%D1%80%D1%81%D0%B8%D1%8E+%D1%81%D1%82%D0%B0%D1%82%D1%8C%D0%B8&article_version%5Btitle%5D=&article_versio
n%5Barticle_code_prefix%5D=ZDR_&article_version%5Bclassifier_ids%5D%5B%5D=&article_version%5Bquestionnaire_id%5D=&article_version%5Brecord_ty
pe_id%5D=&article_version%5Btag_plus%5D=0&article_version%5Btag_plus%5D=1&article_version%5Bcomment%5D=&article_version%5Bbody%5Banswer1%5D%5
D=&article_version%5Bvirtual_linked_articles_codes%5D%5B%5D=&article_version%5Bdouble_linked_codes%5D%5B%5D=&article_version%5Bquestions%5D%5
B%5D='
=> "authenticity_token=cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg&commit=%D0%A1%D0%BE%D0%B7%...
>> Test_parser = Class.new(Rack::QueryParser::Params)
(irb):15: warning: already initialized constant Test_parser
(irb):4: warning: previous definition of Test_parser was here
=> Test_parser
>> p = Rack::QueryParser.new(Test_parser,10000,10000)
(irb):16: warning: `second argument `key_space limit` is deprecated and no longer has an effect. Please call with only two arguments, which will be required in a future version of Rack
=> #<Rack::QueryParser:0x00007f52d72ad038 @param_depth_limit=10000, @params_class=Test_parser>
>> p.parse_nested_query(qs)
=>
{"authenticity_token"=>"cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg",
"commit"=>"Создать версию статьи",
"article_version"=>
{"title"=>"",
"article_code_prefix"=>"ZDR_",
"classifier_ids"=>[""],
"questionnaire_id"=>"",
"record_type_id"=>"",
"tag_plus"=>"1",
"comment"=>"",
"body[answer1"=>{"]"=>""},
"virtual_linked_articles_codes"=>[""],
"double_linked_codes"=>[""],
"questions"=>[""]}}
>>
Note the "body[answer1"=>{"]"=>""}, - this is a field parsed with error.
It would be appreciated if this error could be fixed
URI.decode_uri_component("article_version%5Bbody%5Banswer1%5D%5D=")
=> "article_version[body[answer1]]="
I believe the format we accept is:
"article_version[body][answer1]="
@jeremyevans WDYT?
@ioquatix
As I mentioned above, this is the standard format for generating forms in Rails, and it works fine with rack version 2.2.8. The result should be exactly as shown above
"body"=>{"answer1"=>""},
In the html code it looks like this.
<textarea class="tinymce" placeholder="Enter article text" data-add-field-target="answer" data-tinymce-target="input" name="article_version[body[answer1]]" id="article_version_body[answer1]" aria-hidden="true" style="display: none;"></textarea>
name="article_version[body[answer1]]"
At least not like this, anyway. --> "body[answer1"=>{"]"=>""},
this is the standard format for generating forms in Rails
No, it is not.
I'm very surprised that format has ever parsed predictably. I guess we'll need to dig into history to figure out whether it was totally accidental, or deliberately accommodated as a variant format. The lack of tests suggests the former, though it honestly seems hard to accidentally write code that would handle it.
Hi, @matthewd
I won't insist, but it is quite logical to use this format for some fields. It is especially useful for various JSON fields, when the value for each required key is output separately. In any case, parsing should not produce the given result without any errors, because it produce long backtrace stack and give incorrect errors on users side. I would like to note that sending to the "back" side works fine in both versions. I can use any other format for this fields you suggest, but the field name must contain the name and the key by which the value of this field is determined.
Some additional. As you can see, another method in rack v3.0.8 works fine with the same form-data, but rails itself use parse_nested_query "article_version[body[answer1]]"=>""
>> p.parse_query(qs)
=>
{"authenticity_token"=>"cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg",
"commit"=>"Создать версию статьи",
"article_version[title]"=>"",
"article_version[article_code_prefix]"=>"ZDR_",
"article_version[classifier_ids][]"=>"",
"article_version[questionnaire_id]"=>"",
"article_version[record_type_id]"=>"",
"article_version[tag_plus]"=>["0", "1"],
"article_version[comment]"=>"",
"article_version[body[answer1]]"=>"",
"article_version[virtual_linked_articles_codes][]"=>"",
"article_version[double_linked_codes][]"=>"",
"article_version[questions][]"=>""}
>> p.parse_nested_query(qs)
=>
{"authenticity_token"=>"cATU9VIIozE3HDwS_kVfbyJpRkDlHUhJH3pfZllA-CSPm6L7fkOF-fQFByLv75mFP1uCgW64JyCi5Z4YTloaIg",
"commit"=>"Создать версию статьи",
"article_version"=>
{"title"=>"",
"article_code_prefix"=>"ZDR_",
"classifier_ids"=>[""],
"questionnaire_id"=>"",
"record_type_id"=>"",
"tag_plus"=>"1",
"comment"=>"",
"body[answer1"=>{"]"=>""},
"virtual_linked_articles_codes"=>[""],
"double_linked_codes"=>[""],
"questions"=>[""]}}
https://guides.rubyonrails.org/form_helpers.html#understanding-parameter-naming-conventions
name="person[address][city]"
Inside view
<div class='answer_body' data-controller='tinymce'>
<%= form.text_area "body[#{k}]", class: 'tinymce', placeholder: t(:body_prompt), value: @article_version.body[k], data: { add_field_target: 'answer', tinymce_target: 'input' } %>
This is a bug in the application that needs to be fixed, not a bug in Rack. That it worked in older versions of Rack is accidental.
This is a bug in the application that needs to be fixed, not a bug in Rack. That it worked in older versions of Rack is accidental.
@jeremyevans,
I draw my conclusions based on the fact that TWO methods in rack (see above) produce DIFFERENT results for the same input data. If this is the correct behaviour from your point of view - perhaps it should be reflected in the gem description. From my point of view, two methods that differ only in output data format should produce identical results. Another result - is a BUG. Besides, RFC description of form-data does not contain restrictions on variable naming and the task of parsing is just to return variable names as they are in the original query string. A parser that CHANGES the variable name and passes a PART of this name does not work correctly.
parse_query gives no meaning to the content of the field names, so it returns a flat hash containing the names as supplied (including [ and ] characters). That's the RFC-matching behaviour.
parse_nested_query, by design and documentation, applies additional parsing to the field names, giving special meaning to [ and ] characters, to construct a nested data structure. For that behaviour to work, the [ and ] characters have to be used in a way that is syntactically valid not just for the RFC, but also for the nested-query behaviour.
The whole point of having the two methods is the fact that they do different things. You can already see that happening with "article_version[title]"=>"" vs "article_version"=>{"title"=>""}, regardless of the broken example.
@matthewd ,
Thank you for the detailed explanations of the difference in the objectives of the methods. In version 2.2.8, the field named ""article_version[body[answer1]]"=>"" returned by parse_query was successfully and perfectly correctly transformed into "article_version"=> {"body"=>{"answer1"=>""}} by the parse_nested_query method. What prevents you from doing it in the current version ? This looks very logical and allows a shorter version of view generation for array and hash fields. Especially now that these types of fields are being used more and more in various forms. In any case, parsing a field name should not change that name, from my point of view. You can produce a parsing error - that's also an option, but you can't produce a WRONG field name as a result of the method.
We could change article_version[body[answer1]]= to parse as {"article_version"=>{"body[answer1]"=>""}} instead, by keeping count of the number of [ and ]. That would make slightly more sense than the current parsing. But we aren't going to support parsing it as {"article_version"=>{"body"=>{answer1"=>""}}}.
However, I'm not sure trying to support {"article_version"=>{"body[answer1]"=>""}} is worth the complexity it would entail. @ioquatix @matthewd your thoughts?
@jeremyevans
Another way leads to increased complexity of forms, when there is a layout of nested elements of standard arrays and hash into simple names, which leads to an additional "layer" of processing and errors. It seems to me a better way to still process the transfer (it already works in all browsers) and parsing of such fields. I would be grateful for your support of my approach. :-)
To reiterate for clarity, we aren't going to support parsing article_version[body[answer1]]= as {"article_version"=>{"body"=>{answer1"=>""}}}. Trying to support that brings more complexity and results in two separate ways to support the same thing, which we do not want.
If you do not want to change the parameter keys you are using, please switch from using parse_nested_query to parse_query, and then handle the nesting of keys yourself.
@jeremyevans
As I said, we are using rails 7.0.8, which in turn uses rack 2.2.8, and everything is working fine. With the release of the new Rails 7.1.1, a scheduled upgrade has resulted in a change of rack version to 3.0.8, which in turn has resulted in a call to you.
The option you suggested is possible, but it is not the right way to customise Rails.
If you solve the problem with at least the correct field name - I will consider how to better solve the overall problem. The last thing I want to do is to make any custom solutions for docking two gems.
The option of changing the design of forms is also possible, as I said, the problem is that we have a lot of these forms, we actively use arrays and hashes.
Leaving rack in version 2.2.8 is also a possible option, but also not very correct from my point of view for further update of the stack, because the next updates may require the current version of rack.
If you use the correct parameter name, article_version[body][answer1]=, then it will work in all versions of Rack. Please understand that the fact that article_version[body[answer1]]= worked in older versions of Rack was not by design. You were relying on undefined behavior.
I'm sorry if you have a lot of forms that are doing things incorrectly. It would have better if older Rack did not unexpectedly handle the format you are using by accident. However, there is nothing we can do about that now. Your best approach forward is to fix the parameter names in all of your forms.
@jeremyevans
Yes, I understand your position very well. It's not about how to pass the name in the form, I can make any custom name. The problem is a bit different - if the form name doesn't match the value of the variable, the field won't populate when an existing record is opened.
The field is a hash. If you use a direct normal reference - it will look like the usual body['answer1'] in ruby. Full reference looks like article_version.body['answer1'] or article_version[body['answer1'] ] for active record.
So it is only a matter of "parsing" hash in the form itself and custom solutions ala OpenStruct to support filling key fields with the values of these keys.
In case of at least some parsing of the field name from your side, I will try to implement the logic at least at the controller or model level.
I hope, I have clearly explained the essence of the problems arising from the lack of such parsing.
I was very happy when in version 2.2.8 the construction with the field name as a direct reference to the hash value worked. It really removes a lot of problems when working with arrays and hash in any forms.
And your words that it was nothing more than an accident made me very sad.
And your words that it was nothing more than an accident made me very sad.
I'm sorry to hear that. I realize this is disappointing to you, and could result in a lot of additional work.
And your words that it was nothing more than an accident made me very sad.
I'm sorry to hear that. I realize this is disappointing to you, and could result in a lot of additional work.
Don't worry, nobody cancelled monkeypath :-)
I actually really like the existing solution in 2.2.8.
Anyway, if you somehow fix the output of the parsing results as the correct field name, there will be an opportunity to think about the problem again from all sides.
Thanks for participating in solving our problems, although I believe we are not the only ones with such problems :-)
However, I'm not sure trying to support
{"article_version"=>{"body[answer1]"=>""}}is worth the complexity it would entail.
Yeah, that would be a fully new feature, and I don't think that's worth the huge slowdown it would impose on all parameter parsing.
If there were evidence that the previous behaviour was undocumented-but-originally-intended, then I might be inclined to re-support it for a while, with a gentler deprecation path. That would at least [presumably] still be possible with simple (regex) splits.
Seems omniauth may have been relying on this behavior as well. Not totally sure if it's the same, but see referenced issue https://github.com/cookpad/omniauth-rails_csrf_protection/issues/15.
For reference, this change is caused by this PR: https://github.com/rack/rack/pull/1797
We had a wrongly formatted input field name and fixed it to the correct one
See: https://github.com/solidusio-contrib/solidus_prototypes/pull/65/files
TL;DR
Use some[nested][param] instead of some[nested[param]]
I personally believe it would be advantageous to define a proper grammar for parameter parsing (ideally using Ragel). In fact, I already did that: https://github.com/socketry/xrb/blob/main/parsers/xrb/query.rl
This grammar would not parse nested square brackets:
XRB::Query.parse(XRB::Buffer "a[b]=c")
# => {:a=>{:b=>"c"}}
XRB::Query.parse(XRB::Buffer "a[b[c]]=x")
# /Users/samuel/.gem/ruby/3.3.0/gems/xrb-0.6.1/lib/xrb/query.rb:15:in `parse_query': <string>[1:3]: could not parse all input (XRB::ParseError)
Unfortunately, Rack's parser was too loose, and we see another instance of Hyrum's Law.
The question now is do we officially support such a format?
- We know this is a breaking change for some applications trying to upgrade to Rack 3.x. In addition, without sufficient unit testing, it may not be obvious that this is broken.
- There may be a performance cost for supporting an updated specification which supports nested square brackets. However, IMHO, using a formal grammar compiled to a native extension should mitigate performance issues (but may not be something we want to adopt in Rack).
- There still appears to be some ambiguity around how such fields should be parsed, e.g.
a[b[c]]=xcould be parsed as{a: {b: {c: 'x'}}or{a: {'b[c]': x}} - What does the standard say about this, e.g. we shouldn't encourage a bespoke format which is incompatible with
www-form-urlencodedor similar standards.
We can see from the "standard":
The
application/x-www-form-urlencodedformat is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.
Unfortunately the guidance from x-www-form-urlencoded is limited to key-value parsing and makes no assumptions about the nature of "square brackets".
Falling back to the time-tested PHP:
php > parse_str("a[b[c]]=x", $params);
php > print_r($params);
Array
(
[a] => Array
(
[b[c] => x
)
)
php > parse_str("a[b][c]=x", $params);
php > print_r($params);
Array
(
[a] => Array
(
[b] => Array
(
[c] => x
)
)
)
It does something even more weird when faced with a[b[c]] ignoring the trailing ].
In summary:
- Sergey Arkhipov: Highlights that the change breaks existing forms in Rails, which worked fine in Rack 2.2.8. He argues for maintaining backward compatibility to avoid extensive refactoring of forms.
- Jeremy Evans: Emphasizes that the current behavior in Rack 3.0.8 is by design and matches the correct parameter naming conventions. He suggests using valid parameter names like
article_version[body][answer1]. - Matthew Draper: Clarifies that the previous behavior was not by design and that
parse_nested_queryandparse_queryhave different purposes. - Samuel Williams: I believe a formal specification and better error handling (rejecting the invalid format) is appropriate.
In other words, I would be in favour of
Rack::Utils.parse_nested_query("a[b[c]]=x")
# => {"a"=>{"b[c"=>{"]"=>"x"}}}
Raising an error instead.
Rack's parameter parsing has always had the idea of never raising an error based on the format of the parameter name. I'm on the fence regarding changing the parsing to make it strict, but it's a big change and I think could only be done in a major version upgrade.
On the face of it, all of these look a bit wrong to me:
irb(main):002> Rack::Utils.parse_nested_query("a]=10")
=> {"a]"=>"10"}
irb(main):003> Rack::Utils.parse_nested_query("a][=10")
=> {"a]["=>"10"}
irb(main):004> Rack::Utils.parse_nested_query("a][=10]")
=> {"a]["=>"10]"}
irb(main):005> Rack::Utils.parse_nested_query("a][=[10]")
=> {"a]["=>"[10]"}
irb(main):006> Rack::Utils.parse_nested_query("[a][=[10]")
=> {"[a]["=>"[10]"}
irb(main):007> Rack::Utils.parse_nested_query("[a]][=[10]")
=> {"[a]]["=>"[10]"}
irb(main):008> Rack::Utils.parse_nested_query("[a]][===[10]")
=> {"[a]]["=>"==[10]"}
Of course, our interpretation of the key and value is up to us, and the specification for the query part of the URL is unfortunately loose.
In other words, you are asserting that:
Rack::Utils.parse_nested_query("a[b[c]]=x")
# => {"a"=>{"b[c"=>{"]"=>"x"}}}
is the best we can do?
Okay, what about introducing a strict mode to the parser and defining a formal grammar/structure? Then, we can start off making it opt-in.
In other words, you are asserting that:
Rack::Utils.parse_nested_query("a[b[c]]=x") # => {"a"=>{"b[c"=>{"]"=>"x"}}}is the best we can do?
I think it's best unless we want to start rejecting parameter name formats, which we've never done in the past.
Okay, what about introducing a strict mode to the parser and defining a formal grammar/structure? Then, we can start off making it opt-in.
I'm open to that.