http_spec icon indicating copy to clipboard operation
http_spec copied to clipboard

Escaping URL strings when running get do;end blocks.

Open stanislaw opened this issue 12 years ago • 12 comments

I wonder, if http_spec does not escape request strings intentionally or this could be added as a feature?

For example if I write this request with semicolon escaped

get "/api/v1/places?bounds=46.28,30.53%3B46.59,31.73" do
end

then Sinatra (it serves to mock server api I run http_spec against) populates params hash correctly and vice versa if I write semicolons as is.

Thanks!

stanislaw avatar Sep 11 '12 19:09 stanislaw

I think URL escaping these strings makes sense, but it is somewhat complicated by the route parameters feature. For example:

get "/foo/:id" do
  do_request :id => 1 # requests "/foo/1" or "/foo/%2Fid"?
end

I'm not sure how to handle this. Escaping inside http_spec is easier for the user, but doesn't lend itself to a clean implementation unless route parameters change.

Ideas?

samwgoldman avatar Sep 11 '12 19:09 samwgoldman

A tricky thing indeed! The only idea I have right now is to escape parameters part of request string after the ? sign if this part is present. This will at least resolve the case like I described initially.

stanislaw avatar Sep 11 '12 22:09 stanislaw

Actually, when thinking about http_spec DSL, at the moment, I don't see any other cases, where escaping could have to be required.

stanislaw avatar Sep 11 '12 22:09 stanislaw

How about this?

get "/api/v1/places" do
  # requests "/api/v1/places?bounds=46.28,30.53%3B46.59,31.73"
  do_request :query => { :bounds => "46.28,30.53;46.59,31.73" }
end

The idea here is we escape values before they become part of the URL.

samwgoldman avatar Sep 11 '12 22:09 samwgoldman

Actually, CGI.escape produces "46.28%2C30.53%3B46.59%2C31.73". Does that work in your example app?

samwgoldman avatar Sep 11 '12 22:09 samwgoldman

It works perfectly, that's why I'm talking about CGI-escaping params part only.

stanislaw avatar Sep 11 '12 22:09 stanislaw

The idea here is we escape values before they become part of the URL.

It is up to you to decide, but I think params could just be CGI escaped - much more cleaner and stealth than adding additional options like :query (am I right - no such option exist yet?).

stanislaw avatar Sep 11 '12 22:09 stanislaw

One more important addition:

Only values for the keys are to be encoded, not the whole resulting params string. I've tested this behaviour just now - if I cgi-encode & symbols, Sinatra failes to parse params correctly.

So my conslusion: if ever implement cgi-encoding, than it should be done on params values one by one.

stanislaw avatar Sep 11 '12 22:09 stanislaw

I've done a bit of research on this, and I think that the best solution for now is to do nothing. Until I can get some advice from an expert on this, I'm going to continue to assume that URLs are properly escaped.

I want to revisit this, so let me jot down the results from my research here:

RFC 3986 §2 "Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. Once produced, a URI is always in its percent-encoded form."

rack-test rack-text behaves more like a browser. Requests are made with a "params" argument. When the method is GET then encode as application/x-www-form-urlencoded, but makes a framework-specific decision to append "[]" to the parameter key when there are multiple values (NB: I don't want to make those kinds of assumptions). When the request is POST, the params are encoded as either application/x-www-form-urlencoded or as multipart/form-data if any of the values is a File. rack-test also assumes that GET requests have no request body, which is another assumption I don't want to make.

HTML5 Form Submission This document is helpfully very clear, but as it is an HTML specification and not an HTTP one, I don't know if the solutions there would be a great fit.

URI.encode_www_form and friends It seems this is preferred to CGI.escape for the purpose of URI encoding and application/x-www-form-urlencoded.

TL;DR: Closing because more feedback is required. Current workaround is to encode all strings manually.

samwgoldman avatar Sep 12 '12 00:09 samwgoldman

It depends on what you want.


First of all, both '/api/v1/places?bounds=46.28,30.53%3B46.59,31.73' and '/api/v1/places?bounds=46.28,30.53;B46.59,31.73' are valid URIs. They just give you different things. See, ; is a Reserved Character, and

URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

So. If it's data, it gets escaped, and if it's a delimiter, it doesn't.


(NB: I don't want to make those kinds of assumptions).

All Ruby web stuff expects the [] convention.


also assumes that GET requests have no request body, which is another assumption I don't want to make.

Fielding on this topic:

So, yes, you can send a body with GET, and no, it is never useful to do so.

And httpbis:

Bodies on GET requests have no defined semantics. Note that sending a body on a GET request might cause some existing implementations to reject the request.

steveklabnik avatar Sep 12 '12 00:09 steveklabnik

Seems like it's impossible to automatically escape the string in @stanislaw's example because it's ambiguous whether the ; is intended to be a delimiter or part of the component. Aside from requiring escaped URIs (current behavior), we would need to receive structured data where it is unambiguous what is a component is (like the query argument to do_request shown above).

Ruby web stuff (and PHP too) expect the [] convention, but http_spec is intended to be more "universal." Also:

URI.encode_www_form([["foo", "foo1"], ["foo", "foo2"]]) # => foo=foo1&foo=foo2

I agree that it's silly to use a body with a GET request, but it is used In The Wild™. This is probably Wrong, but it bums me out to make things impossible. I'm willing to make a trade-off here if there is enough support for it.

I will listen to use cases, implementation suggestions, and :+1:s for this, because I don't feel comfortable making any trade-offs without more use cases.

samwgoldman avatar Sep 12 '12 01:09 samwgoldman

Some good insights here too: https://github.com/technoweenie/faraday/issues/78

Preferences between these?

# foo%5B%5D=bar&foo%5B%5D=baz
do_request :query => [["foo[]", "bar"], ["foo[]", "baz"]]

# foo=bar&baz=quux
do_request :query => { "foo" => "bar", "baz" => "quux" }

vs

# default?
HTTPSpec.query_encoder = BracketizedQueryEncoder

# foo%5B%5D=bar&foo%5B%5D=baz
do_request :query => { "foo" => ["bar", "baz"] }

I am leaning toward the first (less glamorous) option and leveraging URI.encode_www_form.

samwgoldman avatar Sep 19 '12 05:09 samwgoldman