http icon indicating copy to clipboard operation
http copied to clipboard

HTTP.base_url API

Open ixti opened this issue 6 years ago • 12 comments

This is a discussion of API that overtakes exisitng PRs (#512 and #493) in order to plan and clarify API that needs to be implemented.

First of all I propose method name to be base_uri as it better reflects behaviour. I would agree on .origin proposed by @tarcieri, but I think it might be confusing due to URI API:

HTTP::URI.parse("https://example.com/foo/bar").origin
# => "https://example.com" 

Thus I believe the best name will be base_uri. Also, this method should raise exception if schema-less URI given:

HTTP.base_uri("//example.com/users")
# !! ArgumentError: Invalid base URI: //example.com/users

HTTP.base_uri("/users")
# !! ArgumentError: Invalid base URI: /users

Also, it should allow chaining of base URIs:

HTTP.base_uri("https://example.com").base_uri("/users")
# => HTTP::Client with base URI="https://example.com/users

URI merging

IMO using Addressable's URI merging logic is straight forward:

uri = HTTP::URI.parse("https://example.com/a/b/c/")

uri.join("z")     # => https://example.com/a/b/c/z
uri.join("/z")    # => https://example.com/z
uri.join("./z")   # => https://example.com/a/b/c/z
uri.join("../z")  # => https://example.com/a/b/z

uri.join("//foobar.com/ah/a/") # => https://foobar.com/ah/a
uri.join("http://wut.wut/")    # => http://wut.wut

There's one thing that should be handled separately, and we should provide shim for this:

HTTP::URI.parse("https://example.com/foo/bar").join("./baz")
# => https://example.com/foo/baz


HTTP::URI.parse("https://example.com/foo/bar/").join("./baz")
# => https://example.com/foo/bar/baz

The above snippet, IMO, should lead to the same result (foo/bar/baz) in both cases. So we should somewhat normalize base uri to have tailing / if there's none:

def merge_uris(base, appendix)
  base = HTTP::URI.parse(base)
  base = HTTP::URI.parse("#{base}/") unless base.to_s.end_with? "/"

  base.join(appendix)
end

merge_uris("https://example.com/foo/bar", "baz")      # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar", "./baz")    # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar", "../baz")   # => https://example.com/foo/baz
merge_uris("https://example.com/foo/bar", "/baz")     # => https://example.com/baz

merge_uris("https://example.com/foo/bar/", "baz")     # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar/", "./baz")   # => https://example.com/foo/bar/baz
merge_uris("https://example.com/foo/bar/", "../baz")  # => https://example.com/foo/baz
merge_uris("https://example.com/foo/bar/", "/baz")    # => https://example.com/baz

Security consideration

I think it's at least worth pointing out (not as a blocker on this PR, just as a general note) that path joining is an area that's fraught with peril, especially if the base URI is assumed to be "secure" but the joined parameter is attacker-controlled and the attacker can exploit either relative path behaviors or completely overwrite an existing path with an absolute one. -- https://github.com/httprb/http/pull/513#issuecomment-446027345

I think if developer is not validating user input - we can't prevent him from stepping a land mine. Consider this dangerous code:

HTTP.base_uri("https://example.com/foo/bar").get(user_input)

The above hides same danger as:

base_uri = HTTP::URI.parse("https://example.com/foo/bar')
HTTP.get(base_uri.join(user_input)

And of course if one is just concatenating strings he's not doing it much better either:

HTTP.get("#{base_uri}#{user_input}")

What we can do better is to allow "locking" base uri, so that it will raise exception on attempt to get unexpected URI:

client = HTTP.base_uri("https://example.com/foo/bar", :enforced => true)

client.get("baz")
# => GET https://example.com/foo/bar/baz

client.get("../baz")
# !! "https://example.com/foo/baz" is outside of base uri

How about persistent options

Something that was completely missed in opened PRs and discussions is how to handle when both persistent and base URI are given. I think this should be handled this way:

client = HTTP.base_uri("https://example.com/foo/bar")
client.persistent("https://gitlab.com")
# !! ArgumentError, persistence origin mismatch base URI

pclient = HTTP.persistent("https://gitlab.com")
pclient.base_uri("https://example.com/foo/bar")
# !! ArgumentError, base URI is outide of persistence origin

In other words we should raise an error when there's conflict between the two. And in order to make life a bit easier we can allow:

pclient = HTTP.base_uri("https://example.com/foo/bar").persistent

In other words, if there's base URI, persistent can get its origin by default.


/cc @httprb/core, @tarcieri, @bjeanes, @paul, @sj26

Overtakes #493, #512, #513

ixti avatar Jan 16 '19 21:01 ixti

I would agree on .origin proposed by @tarcieri, but I think it might be confusing due to URI API

You're right. origin is fully qualified, so base_uri makes more sense for things that can be relative. 👍

tarcieri avatar Jan 16 '19 21:01 tarcieri

👍

The above snippet, IMO, should lead to the same result (foo/bar/baz) in both cases. So we should somewhat normalize base uri to have tailing /`` if there's none:

Personally, I think it would be better to be consistent with URI here. I agree that most people probably always mean to append, but how URI works is consistent with how browsers would follow links and with other HTTP libraries (in my experience).

In other words, if there's base URI, persistent can get its origin by default.

🙇 yes!

bjeanes avatar Jan 16 '19 21:01 bjeanes

Seems pretty great, I like the "fix" to how addressable handles #join. I'm often doing something like:

base = A::URI.parse("https://foo.example/api/v2")
http.get(base.join("users"))  # 404 on https://foo.example/api/users

... and I am always surprised by the behavior.

:+1:

paul avatar Jan 16 '19 21:01 paul

"Origin" has a very particular meaning in HTTP which doesn't include paths, it's a tuple of scheme, host and port, so I like "base uri" as a name. It also gels with the HTML base element. :+1:

URI resolution (joining) is a well known and specified thing. I'd expect URI joining to work the same as URI.join or the browser URL resolver.

sj26 avatar Jan 17 '19 00:01 sj26

@sj26, I agree that we should avoid introducing new "standard" and follow RFC, which I believe Addressable::URI#join is following.


ixti avatar Jan 17 '19 00:01 ixti

@paul I must admit I'm also surprised everytime, but, I guess it's better to follow RFC as was mentioned above.

ixti avatar Jan 17 '19 00:01 ixti

I find it very frustrating if I approach it as a Ruby-ist expecting it to work like Array#join, but imho it makes sense in the context of a URI and I like that it functions per the URI specification.

sj26 avatar Jan 17 '19 00:01 sj26

Agree. Although I also find URI.join a bit weird, especially when comparing it to File.join, but as we are handling URIs here we should follow RFC of URI joining. After all we do follow RFC when we follow redirects.

ixti avatar Jan 17 '19 01:01 ixti

I was surprised there was no way to set a base path for a persistent connection and found this. Seems like a simple use case, creating a pool of persistent connections to a host on /api and making requests like get("users/123") and it actually calling GET /api/users/123. I was surprised that persistent(url) actually dropped the path entirely.

mikebaldry avatar Oct 01 '19 13:10 mikebaldry

@mikebaldry as I hope you can see from this thread (and #512 and #493), it's a case of "easier said than done"

tarcieri avatar Oct 01 '19 14:10 tarcieri