addressable
addressable copied to clipboard
Addressable::Template#extract don't get value including '@'.
uri = Addressable::Template.new('/{example}/{field}')
param = uri.extract('/x/@y') # => nil
I expected to return ' { "example" => "x", "field" => "@y" } '. I think that extract method doesn't treat '@' as modifier.
@
is in the reserved character set, so {field}
should be pct-encoding it to %40
. If you wanted reserved expansion, you could use {+field}
in your template instead, which returns the expected value.
I'd have to consider this, because it's an extract
call instead of an expand
call, it's a little less straight-forward.
@therabidbanana, thoughts?
Sorry - meant to comment as @therabidbanana, @adstage-david is my work account.
The way I programmed the template extraction was intended to be exactly inverse of template expand. Since expand would have percent encoded that value, the extract method expects it to be percent encoded.
t = Addressable::Template.new('/{example}/{field}')
t.extract("/x/%40y") #=> { "example" => "x", "field" => "@y" }
It would be possible to make it less restrictive. The problem is that if it can match reserved characters it would make matching have odd results - the more restrictive we are with the character sets, the better chance we have at extraction on more complicated queries with the regex approach (which is admittedly probably a bit limited). We'd definitely need it to be an option that was default off. Practical example:
t = Addressable::Template.new("/{example}/{field}{?limit,offset}")
t.extract("/x/%40y?limit=10&offset=0")
#=> {"example" => "x", "field" => "@y", "limit" => 10, "offset" => 0}
t.extract("/x/@y?limit=10&offset=0", match_reserved: true)
#=> {"example" => "x", "field" => "@y?limit=10&offset=0", "limit" => nil, "offset" => nil}
The second call would not be a bug, because in the case of /{example}/{+field}{?limit,offset}
you could have passed the query as part of the field
variable, so there are actually two different but completely valid possible extractions. Unreserved matching makes any remaining variables very difficult to extract.
Now that I've written this out, I'm realizing we're having a lot of issues because of the @
symbol, so maybe we could do some sort of match_reserved option that took a set of allowed characters? This would probably be doable:
t = Addressable::Template.new("/{example}/{field}{?limit,offset}")
t.extract("/x/@y?limit=10&offset=0", match_reserved: "@")
#=> {"example" => "x", "field" => "@y", "limit" => 10, "offset" => 0}
I like the second proposal, though I'd probably want to see it be a bit more intelligent about which component things are being extracted from. For instance, the path component does not require @
to be escaped even though it's reserved:
Addressable::URI.parse('http://example.com/@test').normalize
# => #<Addressable::URI:0x3fc3ea170654 URI:http://example.com/@test>
But yes, OpenSocial, Webfinger, and mailto: are a pain in the butt because of the way they've dealt with the @
character.
Thank you both for digging up this issue. I was hit by it today and finding this existing Issue means I'm not going crazy.. yet :wink:
As above I was trying to pattern match a URL with a query param value whose original form had an @ in it. I was using ActiveSupport's Hash#to_query method to generate the URL, which resulted in it being %40-ed, but I think Webmock or http.rb brought it back to the unescaped version.
I've made this small script to demonstrate the issue:
https://gist.github.com/jwarchol/6a6fc3efa7be054172d7
In my case I've worked around this problem by fudging the email address string as "-at-", in this case it doesn't actually need to be plausibly formatted, but it would be nice if this hack wasn't needed.
This case is unfortunately driven by the underlying spec, which makes it a bit more challenging to resolve @jwarchol. Specifically, Addressable::Template#match
was intended to match URIs built by Addressable::Template#expand
, and #expand
has very specific rules determined by the spec (whereas matching is not part of the spec at all):
https://tools.ietf.org/html/rfc6570#section-3.2.1
The allowed set for a given expansion depends on the expression type: reserved ("+") and fragment ("#") expansions allow the set of characters in the union of ( unreserved / reserved / pct-encoded ) to be passed through without pct-encoding, whereas all other expression types allow only unreserved characters to be passed through without pct-encoding.
and
https://tools.ietf.org/html/rfc6570#section-1.5
pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
So these rules are together making the webmock experience less straightforward because what are normally valid URIs are actually not valid expansions of Addressable::Template.
There's a few approaches that could be taken to make this more flexible in what it accepts, but we're somewhat limited by the current implementation of matching - which builds a big regex and doesn't really have a concept of the different sections of a URI and what characters are valid in each section.
- Rebuild template matching to take into account valid characters in each part of the url, even if they aren't necessarily valid expansion values. This would be a lot of work, but potentially the right way to approach it. Even if
[email protected]
is not technically a valid expansion of{?email}
, we generally know it should match. This does break the strict meaning of#match
, so maybe two variants of match would need to be built - one for strict matching that would only match valid expansions, and a loose match mode that would try and do what people are trying to do with webmock. - Allow some way to specify valid character sets to match for each template/call to match, to allow building a regex that will properly match in edge cases where the URI in question is not a valid expansion on a case by case basis.
I think @sporkmonger prefers something along the second way, but I've not had time to devote to either approach. I'm going to cross link this to related webmock issue - maybe there are other ideas out there or someone interested in spending time to fix it: https://github.com/bblimke/webmock/issues/468