http
http copied to clipboard
HTTP.base_url API
This is a discussion of API that overtakes exisitng PRs (#512 and #493) in order to plan and clarify API that needs to be implemented.
First of all I propose method name to be base_uri as it better reflects behaviour. I would agree on .origin proposed by @tarcieri, but I think it might be confusing due to URI API:
HTTP::URI.parse("https://example.com/foo/bar").origin
# => "https://example.com"
Thus I believe the best name will be base_uri. Also, this method should raise exception if schema-less URI given:
HTTP.base_uri("//example.com/users")
# !! ArgumentError: Invalid base URI: //example.com/users
HTTP.base_uri("/users")
# !! ArgumentError: Invalid base URI: /users
Also, it should allow chaining of base URIs:
HTTP.base_uri("https://example.com").base_uri("/users")
# => HTTP::Client with base URI="https://example.com/users
URI merging
IMO using Addressable's URI merging logic is straight forward:
uri = HTTP::URI.parse("https://example.com/a/b/c/")
uri.join("z") # => https://example.com/a/b/c/z
uri.join("/z") # => https://example.com/z
uri.join("./z") # => https://example.com/a/b/c/z
uri.join("../z") # => https://example.com/a/b/z
uri.join("//foobar.com/ah/a/") # => https://foobar.com/ah/a
uri.join("http://wut.wut/") # => http://wut.wut
There's one thing that should be handled separately, and we should provide shim for this:
HTTP::URI.parse("https://example.com/foo/bar").join("./baz")
# => https://example.com/foo/baz
HTTP::URI.parse("https://example.com/foo/bar/").join("./baz")
# => https://example.com/foo/bar/baz
The above snippet, IMO, should lead to the same result (foo/bar/baz) in both cases. So we should somewhat normalize base uri to have tailing / if there's none:
def merge_uris(base, appendix)
base = HTTP::URI.parse(base)
base = HTTP::URI.parse("#{base}/") unless base.to_s.end_with? "/"
base.join(appendix)
end
merge_uris("https://example.com/foo/bar", "baz") # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar", "./baz") # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar", "../baz") # => https://example.com/foo/baz
merge_uris("https://example.com/foo/bar", "/baz") # => https://example.com/baz
merge_uris("https://example.com/foo/bar/", "baz") # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar/", "./baz") # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar/", "../baz") # => https://example.com/foo/baz
merge_uris("https://example.com/foo/bar/", "/baz") # => https://example.com/baz
Security consideration
I think it's at least worth pointing out (not as a blocker on this PR, just as a general note) that path joining is an area that's fraught with peril, especially if the base URI is assumed to be "secure" but the joined parameter is attacker-controlled and the attacker can exploit either relative path behaviors or completely overwrite an existing path with an absolute one. -- https://github.com/httprb/http/pull/513#issuecomment-446027345
I think if developer is not validating user input - we can't prevent him from stepping a land mine. Consider this dangerous code:
HTTP.base_uri("https://example.com/foo/bar").get(user_input)
The above hides same danger as:
base_uri = HTTP::URI.parse("https://example.com/foo/bar')
HTTP.get(base_uri.join(user_input)
And of course if one is just concatenating strings he's not doing it much better either:
HTTP.get("#{base_uri}#{user_input}")
What we can do better is to allow "locking" base uri, so that it will raise exception on attempt to get unexpected URI:
client = HTTP.base_uri("https://example.com/foo/bar", :enforced => true)
client.get("baz")
# => GET https://example.com/foo/bar/baz
client.get("../baz")
# !! "https://example.com/foo/baz" is outside of base uri
How about persistent options
Something that was completely missed in opened PRs and discussions is how to handle when both persistent and base URI are given. I think this should be handled this way:
client = HTTP.base_uri("https://example.com/foo/bar")
client.persistent("https://gitlab.com")
# !! ArgumentError, persistence origin mismatch base URI
pclient = HTTP.persistent("https://gitlab.com")
pclient.base_uri("https://example.com/foo/bar")
# !! ArgumentError, base URI is outide of persistence origin
In other words we should raise an error when there's conflict between the two. And in order to make life a bit easier we can allow:
pclient = HTTP.base_uri("https://example.com/foo/bar").persistent
In other words, if there's base URI, persistent can get its origin by default.
/cc @httprb/core, @tarcieri, @bjeanes, @paul, @sj26
Overtakes #493, #512, #513
I would agree on .origin proposed by @tarcieri, but I think it might be confusing due to URI API
You're right. origin is fully qualified, so base_uri makes more sense for things that can be relative. 👍
👍
The above snippet, IMO, should lead to the same result (
foo/bar/baz) in both cases. So we should somewhat normalize base uri to have tailing /`` if there's none:
Personally, I think it would be better to be consistent with URI here. I agree that most people probably always mean to append, but how URI works is consistent with how browsers would follow links and with other HTTP libraries (in my experience).
In other words, if there's base URI, persistent can get its origin by default.
🙇 yes!
Seems pretty great, I like the "fix" to how addressable handles #join. I'm often doing something like:
base = A::URI.parse("https://foo.example/api/v2")
http.get(base.join("users")) # 404 on https://foo.example/api/users
... and I am always surprised by the behavior.
:+1:
"Origin" has a very particular meaning in HTTP which doesn't include paths, it's a tuple of scheme, host and port, so I like "base uri" as a name. It also gels with the HTML base element. :+1:
URI resolution (joining) is a well known and specified thing. I'd expect URI joining to work the same as URI.join or the browser URL resolver.
@sj26, I agree that we should avoid introducing new "standard" and follow RFC, which I believe Addressable::URI#join is following.

@paul I must admit I'm also surprised everytime, but, I guess it's better to follow RFC as was mentioned above.
I find it very frustrating if I approach it as a Ruby-ist expecting it to work like Array#join, but imho it makes sense in the context of a URI and I like that it functions per the URI specification.
Agree. Although I also find URI.join a bit weird, especially when comparing it to File.join, but as we are handling URIs here we should follow RFC of URI joining. After all we do follow RFC when we follow redirects.
I was surprised there was no way to set a base path for a persistent connection and found this. Seems like a simple use case, creating a pool of persistent connections to a host on /api and making requests like get("users/123") and it actually calling GET /api/users/123. I was surprised that persistent(url) actually dropped the path entirely.
@mikebaldry as I hope you can see from this thread (and #512 and #493), it's a case of "easier said than done"