addressable icon indicating copy to clipboard operation
addressable copied to clipboard

Addressable::Template#extract don't get value including '@'.

Open somura opened this issue 11 years ago • 6 comments

uri = Addressable::Template.new('/{example}/{field}')
param = uri.extract('/x/@y') # => nil

I expected to return ' { "example" => "x", "field" => "@y" } '. I think that extract method doesn't treat '@' as modifier.

somura avatar Oct 04 '13 02:10 somura

@ is in the reserved character set, so {field} should be pct-encoding it to %40. If you wanted reserved expansion, you could use {+field} in your template instead, which returns the expected value.

adstage-david avatar Oct 24 '13 21:10 adstage-david

I'd have to consider this, because it's an extract call instead of an expand call, it's a little less straight-forward.

@therabidbanana, thoughts?

sporkmonger avatar Oct 26 '13 14:10 sporkmonger

Sorry - meant to comment as @therabidbanana, @adstage-david is my work account.

The way I programmed the template extraction was intended to be exactly inverse of template expand. Since expand would have percent encoded that value, the extract method expects it to be percent encoded.

    t = Addressable::Template.new('/{example}/{field}')
    t.extract("/x/%40y") #=>  { "example" => "x", "field" => "@y" }

It would be possible to make it less restrictive. The problem is that if it can match reserved characters it would make matching have odd results - the more restrictive we are with the character sets, the better chance we have at extraction on more complicated queries with the regex approach (which is admittedly probably a bit limited). We'd definitely need it to be an option that was default off. Practical example:

t = Addressable::Template.new("/{example}/{field}{?limit,offset}")
t.extract("/x/%40y?limit=10&offset=0") 
  #=> {"example" => "x", "field" => "@y", "limit" => 10, "offset" => 0}
t.extract("/x/@y?limit=10&offset=0", match_reserved: true) 
  #=> {"example" => "x", "field" => "@y?limit=10&offset=0", "limit" => nil, "offset" => nil}

The second call would not be a bug, because in the case of /{example}/{+field}{?limit,offset} you could have passed the query as part of the field variable, so there are actually two different but completely valid possible extractions. Unreserved matching makes any remaining variables very difficult to extract.

Now that I've written this out, I'm realizing we're having a lot of issues because of the @ symbol, so maybe we could do some sort of match_reserved option that took a set of allowed characters? This would probably be doable:

t = Addressable::Template.new("/{example}/{field}{?limit,offset}")
t.extract("/x/@y?limit=10&offset=0", match_reserved: "@") 
  #=> {"example" => "x", "field" => "@y", "limit" => 10, "offset" => 0}

therabidbanana avatar Oct 26 '13 15:10 therabidbanana

I like the second proposal, though I'd probably want to see it be a bit more intelligent about which component things are being extracted from. For instance, the path component does not require @ to be escaped even though it's reserved:

Addressable::URI.parse('http://example.com/@test').normalize
# => #<Addressable::URI:0x3fc3ea170654 URI:http://example.com/@test>

But yes, OpenSocial, Webfinger, and mailto: are a pain in the butt because of the way they've dealt with the @ character.

sporkmonger avatar Oct 28 '13 08:10 sporkmonger

Thank you both for digging up this issue. I was hit by it today and finding this existing Issue means I'm not going crazy.. yet :wink:

As above I was trying to pattern match a URL with a query param value whose original form had an @ in it. I was using ActiveSupport's Hash#to_query method to generate the URL, which resulted in it being %40-ed, but I think Webmock or http.rb brought it back to the unescaped version.

I've made this small script to demonstrate the issue:

https://gist.github.com/jwarchol/6a6fc3efa7be054172d7

In my case I've worked around this problem by fudging the email address string as "-at-", in this case it doesn't actually need to be plausibly formatted, but it would be nice if this hack wasn't needed.

jwarchol avatar Nov 20 '15 18:11 jwarchol

This case is unfortunately driven by the underlying spec, which makes it a bit more challenging to resolve @jwarchol. Specifically, Addressable::Template#match was intended to match URIs built by Addressable::Template#expand, and #expand has very specific rules determined by the spec (whereas matching is not part of the spec at all):

https://tools.ietf.org/html/rfc6570#section-3.2.1

The allowed set for a given expansion depends on the expression type: reserved ("+") and fragment ("#") expansions allow the set of characters in the union of ( unreserved / reserved / pct-encoded ) to be passed through without pct-encoding, whereas all other expression types allow only unreserved characters to be passed through without pct-encoding.

and

https://tools.ietf.org/html/rfc6570#section-1.5

pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

So these rules are together making the webmock experience less straightforward because what are normally valid URIs are actually not valid expansions of Addressable::Template.

There's a few approaches that could be taken to make this more flexible in what it accepts, but we're somewhat limited by the current implementation of matching - which builds a big regex and doesn't really have a concept of the different sections of a URI and what characters are valid in each section.

  1. Rebuild template matching to take into account valid characters in each part of the url, even if they aren't necessarily valid expansion values. This would be a lot of work, but potentially the right way to approach it. Even if [email protected] is not technically a valid expansion of {?email}, we generally know it should match. This does break the strict meaning of #match, so maybe two variants of match would need to be built - one for strict matching that would only match valid expansions, and a loose match mode that would try and do what people are trying to do with webmock.
  2. Allow some way to specify valid character sets to match for each template/call to match, to allow building a regex that will properly match in edge cases where the URI in question is not a valid expansion on a case by case basis.

I think @sporkmonger prefers something along the second way, but I've not had time to devote to either approach. I'm going to cross link this to related webmock issue - maybe there are other ideas out there or someone interested in spending time to fix it: https://github.com/bblimke/webmock/issues/468

therabidbanana avatar Nov 23 '15 18:11 therabidbanana