pydantic-core icon indicating copy to clipboard operation
pydantic-core copied to clipboard

Host is required for `MultiHostUrl`

Open samuelcolvin opened this issue 2 years ago • 8 comments

In theory we could have a MultiHostUrl, but with host optional.

Example from a postgresql dsn: postgresql://derp@/databasename?host=/var/run/postgresql

See https://github.com/pydantic/pydantic/issues/3900

samuelcolvin avatar Feb 03 '23 08:02 samuelcolvin

Keep in mind: I've never worked on the repo, never ran the tests, I havent even git cloned pydantic.

It seems like this is already handled in here (l.311 to l.332): https://github.com/pydantic/pydantic-core/blob/6616c9a3f8bf6fa409d4c845bd695250bcc4e18a/src/validators/url.rs#L311

awoimbee avatar Feb 15 '23 22:02 awoimbee

Url::parse seem to require a host to be set, so I'm not sure how you'd like to implement this?

Parse the URL, if it fails, try to look for query params (using url::form_urlencoded::parse maybe?), if you find one with host key, get the value, and try to Url::parse that? Not sure how to concatenate the found host with whatever is already set. It has to be after @ if there's a user/pw included in the URL, but if not, it has to be after ://. All of this seems a bit messy, when just encoding the host URL works fine, without using query params:
postgresql://derp@%2Fvar%2Frun%2Fpostgresql/databasename

JonasKs avatar Apr 29 '23 15:04 JonasKs

I think the normal Url allows no host, at least on some scenarios, worth looking at those tests.

samuelcolvin avatar Apr 29 '23 15:04 samuelcolvin

Yeah, I believe I could Url::parse only a schema, e.g. Url::parse("postgresql://").unwrap();, but that would fail if I put anything else in there. I see theres an Url::options which I'll check out, maybe it's possible to construct something easily. That will be tomorrow, thanks for the quick reply!

JonasKs avatar Apr 29 '23 15:04 JonasKs

Pretty sure you can do urls with scheme, and path, but no host - e.g. file URIs.

samuelcolvin avatar Apr 29 '23 15:04 samuelcolvin

can I assign myself?

romulocollopy avatar Jul 13 '23 16:07 romulocollopy

Yes, please feel free to assign yourself and open a PR

adriangb avatar Jul 13 '23 17:07 adriangb

I was taking a stab at implementing this and I think there are a few things worth discussing:

I think the normal Url allows no host, at least on some scenarios, worth looking at those tests.

Pretty sure you can do urls with scheme, and path, but no host - e.g. file URIs.

According to the URL Standard that rust-url follows:

If we look at the first rule, postgresql://derp@/databasename is not a valid URL, even though it is a valid postgres DSN. This is confirmed by the reference implementation of said standard.

MultiHostUrl already does not require host:

url = MultiHostUrl('postgresql:///databasename')
assert url.hosts() == []
assert url.path == '/databasename` 

It just adheres to the URL Standard and doesn't allow user/password/port when host is missing. Given how much current implementation relies on the rust-url and that it's possible to represent the same DSN with a valid URL postgresql:///databasename?user=derp, I think that this corner-case is not worth supporting at the moment and the issue should be closed.


There's another issue with MultiHostUrl.

Currently, MultiHostUrl parses connection strings as if each host can have its own user/pass:

assert MultiHostUrl('postgresql://user1:pass1@host1,user2:pass2@host2').hosts() == [
    {'username': 'user1', 'password': 'pass1', 'host': 'host1', 'port': None},
    {'username': 'user2', 'password': 'pass2', 'host': 'host2', 'port': None},
]

But both mongodb and postgres only allow user spec to be specified once, common for all specified hosts. Even if user specs are the same, such DSNs are invalid - psql parses postgresql://user1:pass1@host1,user2:pass2@host2 as user = user1; pass = pass1; hosts = [host1:5432, user2:pass2@host2].

I wonder whether this was a conscious decision to parse it like that and represent each "host" as this dict of four (perhaps in anticipation of use cases where this is allowed), or just a misinterpretation of mongo/postgres formats.

At least from mongo/postgres standpoint, this is definitely broken. We're allowing clearly invalid DSNs; even for valid DSNs the logic of MultiHostUrl feels off:

assert MultiHostUrl('postgresql://user1:pass1@host1,host2').hosts() == [
    {'username': 'user1', 'password': 'pass1', 'host': 'host1', 'port': None},
    # missing username and password feel kinda wrong
    {'username': None, 'password': None, 'host': 'host2', 'port': None}, 
]

e.g. I'd expect to be able to "split" the DSN with something like

split_dsns = [
    MultiHostUrl.build(scheme=dsn.scheme, path=dsn.path, query=dsn.query, fragment=dsn.fragment, **host) 
    for host in dsn.hosts()
]

but this will produce incorrect results in current implementation.

An easy fix would be to have username and password as properties on MultiHostUrl and to remove them from hosts(), but this is a breaking change.

I'm willing to think about better ways to fix this and work on the implementation (probably in a separate issue) if everyone agrees that we should pursue this path.

@samuelcolvin WDYT?

okhaliavka avatar Oct 18 '23 01:10 okhaliavka