http_spec
http_spec copied to clipboard
Escaping URL strings when running get do;end blocks.
I wonder, if http_spec does not escape request strings intentionally or this could be added as a feature?
For example if I write this request with semicolon escaped
get "/api/v1/places?bounds=46.28,30.53%3B46.59,31.73" do
end
then Sinatra (it serves to mock server api I run http_spec against) populates params
hash correctly and vice versa if I write semicolons as is.
Thanks!
I think URL escaping these strings makes sense, but it is somewhat complicated by the route parameters feature. For example:
get "/foo/:id" do
do_request :id => 1 # requests "/foo/1" or "/foo/%2Fid"?
end
I'm not sure how to handle this. Escaping inside http_spec is easier for the user, but doesn't lend itself to a clean implementation unless route parameters change.
Ideas?
A tricky thing indeed! The only idea I have right now is to escape parameters part of request string after the ?
sign if this part is present. This will at least resolve the case like I described initially.
Actually, when thinking about http_spec DSL, at the moment, I don't see any other cases, where escaping could have to be required.
How about this?
get "/api/v1/places" do
# requests "/api/v1/places?bounds=46.28,30.53%3B46.59,31.73"
do_request :query => { :bounds => "46.28,30.53;46.59,31.73" }
end
The idea here is we escape values before they become part of the URL.
Actually, CGI.escape produces "46.28%2C30.53%3B46.59%2C31.73"
. Does that work in your example app?
It works perfectly, that's why I'm talking about CGI-escaping params part only.
The idea here is we escape values before they become part of the URL.
It is up to you to decide, but I think params could just be CGI escaped - much more cleaner and stealth than adding additional options like :query
(am I right - no such option exist yet?).
One more important addition:
Only values for the keys are to be encoded, not the whole resulting params string. I've tested this behaviour just now - if I cgi-encode &
symbols, Sinatra failes to parse params correctly.
So my conslusion: if ever implement cgi-encoding, than it should be done on params values one by one.
I've done a bit of research on this, and I think that the best solution for now is to do nothing. Until I can get some advice from an expert on this, I'm going to continue to assume that URLs are properly escaped.
I want to revisit this, so let me jot down the results from my research here:
RFC 3986 §2 "Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. Once produced, a URI is always in its percent-encoded form."
rack-test rack-text behaves more like a browser. Requests are made with a "params" argument. When the method is GET then encode as application/x-www-form-urlencoded, but makes a framework-specific decision to append "[]" to the parameter key when there are multiple values (NB: I don't want to make those kinds of assumptions). When the request is POST, the params are encoded as either application/x-www-form-urlencoded or as multipart/form-data if any of the values is a File. rack-test also assumes that GET requests have no request body, which is another assumption I don't want to make.
HTML5 Form Submission This document is helpfully very clear, but as it is an HTML specification and not an HTTP one, I don't know if the solutions there would be a great fit.
URI.encode_www_form and friends It seems this is preferred to CGI.escape for the purpose of URI encoding and application/x-www-form-urlencoded.
TL;DR: Closing because more feedback is required. Current workaround is to encode all strings manually.
It depends on what you want.
First of all, both '/api/v1/places?bounds=46.28,30.53%3B46.59,31.73' and '/api/v1/places?bounds=46.28,30.53;B46.59,31.73' are valid URIs. They just give you different things. See, ;
is a Reserved Character, and
URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
So. If it's data, it gets escaped, and if it's a delimiter, it doesn't.
(NB: I don't want to make those kinds of assumptions).
All Ruby web stuff expects the []
convention.
also assumes that GET requests have no request body, which is another assumption I don't want to make.
So, yes, you can send a body with GET, and no, it is never useful to do so.
And httpbis:
Bodies on GET requests have no defined semantics. Note that sending a body on a GET request might cause some existing implementations to reject the request.
Seems like it's impossible to automatically escape the string in @stanislaw's example because it's ambiguous whether the ;
is intended to be a delimiter or part of the component. Aside from requiring escaped URIs (current behavior), we would need to receive structured data where it is unambiguous what is a component is (like the query
argument to do_request
shown above).
Ruby web stuff (and PHP too) expect the []
convention, but http_spec is intended to be more "universal." Also:
URI.encode_www_form([["foo", "foo1"], ["foo", "foo2"]]) # => foo=foo1&foo=foo2
I agree that it's silly to use a body with a GET request, but it is used In The Wild™. This is probably Wrong, but it bums me out to make things impossible. I'm willing to make a trade-off here if there is enough support for it.
I will listen to use cases, implementation suggestions, and :+1:s for this, because I don't feel comfortable making any trade-offs without more use cases.
Some good insights here too: https://github.com/technoweenie/faraday/issues/78
Preferences between these?
# foo%5B%5D=bar&foo%5B%5D=baz
do_request :query => [["foo[]", "bar"], ["foo[]", "baz"]]
# foo=bar&baz=quux
do_request :query => { "foo" => "bar", "baz" => "quux" }
vs
# default?
HTTPSpec.query_encoder = BracketizedQueryEncoder
# foo%5B%5D=bar&foo%5B%5D=baz
do_request :query => { "foo" => ["bar", "baz"] }
I am leaning toward the first (less glamorous) option and leveraging URI.encode_www_form
.