pydantic
pydantic copied to clipboard
AnyUrl adds trailing slash
Initial Checks
- [X] I confirm that I'm using Pydantic V2
Description
AnyURL now adds a trailing slash in validation of URLs that dont have a trailing slash. This caused CORS issues for us, because CORS doesnt allow trailing slash (I think, seems to be written here: https://www.w3.org/TR/2008/WD-access-control-20080214/)
In my opinion i see this as a bug because I dont see the option toggle this addition of trailing slash. Maybe its expected behaviour, please let me know then how I can turn it off.
Example Code
from pydantic import AnyHttpUrl as AnyUrlv2
from pydantic.v1 import AnyHttpUrl as AnyUrlv1
print(AnyUrlv2("http://localhost:3000"))
print(AnyUrlv1("http://localhost:3000", scheme="http"))
prints:
http://localhost:3000/
http://localhost:3000
Python, Pydantic & OS Version
pydantic version: 2.2.1
pydantic-core version: 2.6.1
pydantic-core build: profile=release pgo=false
install path: /Users/charelfeltenrq/.pyenv/versions/3.10.8/lib/python3.10/site-packages/pydantic
python version: 3.10.8 (main, Dec 2 2022, 11:54:08) [Clang 14.0.0 (clang-1400.0.29.202)]
platform: macOS-13.3.1-arm64-arm-64bit
optional deps. installed: ['email-validator', 'typing-extensions']
Selected Assignee: @dmontagu
somewhat related: https://github.com/pydantic/pydantic-settings/issues/104
At this point, to my understanding, it seems its intended to add the slash. Is it? In that case, I am interested in whats the optimal way to remove this trailing slash, without forgoing the other validation (as seems to be the proposed solution in the related issue)
This problem also becomes apparent in the url builder:
from pydantic import AnyHttpUrl as AnyHttpUrl
AnyHttpUrl.build(
scheme="https",
host="google.com",
path="/search",
)
produces: Url('https://google.com//search')
Is that the desired outcome? If yes, then does it mean that all Url builds we have in our codebase should now have a path that starts without a slash?
The second case is unrelated.
This is done by the URL rust crate, so it probably mandated by a rfc somewhere.
Sorry, the second case is related.
Another weird situation
from pydantic import AnyHttpUrl as AnyHttpUrl
AnyHttpUrl.build(
scheme="https",
host="google.com/",
path="search",
)
Url('https://google.com//search')
Another weird situation
That's technically valid and works, not sure we need to do anything special there.
I think we might come up with a better solution for this long term but for now if you don't need a Url
instance you could do something like:
from typing import Annotated, Any, List
from pydantic_core import CoreSchema, core_schema
from pydantic import AfterValidator, BeforeValidator, GetCoreSchemaHandler, TypeAdapter
from pydantic.networks import AnyUrl
class Chain:
def __init__(self, validations: List[Any]) -> None:
self.validations = validations
def __get_pydantic_core_schema__(self, source_type: Any, handler: GetCoreSchemaHandler) -> CoreSchema:
return core_schema.chain_schema(
[
*(handler.generate_schema(v) for v in self.validations),
handler(source_type)
]
)
def strip_slash(value: str) -> str:
return value.rstrip('/')
MyAnyUrl = Annotated[str, AfterValidator(strip_slash), BeforeValidator(str), Chain([AnyUrl])]
ta = TypeAdapter(MyAnyUrl)
print(ta.validate_python('https://www.google.com'))
# https://www.google.com
Thank you for posting this workaround, I am not 100% sure whats going on there, but to my understanding it seems that it creates a type MyAnyUrl
which is a str, but includes the validation from AnyUrl
. That could be useful, maybe we will use it or a similar idea.
However, I am still a bit baffled that it seems to be seen as not a problem that AnyUrl adds trailing /
to all Urls, to me this is not validation anymore as it actually changes the Url. And also .build
creating Urls with //
in them is problematic for our situation.
For now, we have decided to roll back to pydantic v1. Will there be any way to keep using the pydantic v1 Url behaviour (i.e. not adding /
) in v2? We tried via from pydantic.v1 import AnyUrl
but this caused errors inside BaseSettings
. Or is this really the desired behaviour and you are not planning on changing it back?
Lastly, just wanted to mention, I am a very happpy user of pydantic, and I love the work you all do, just this change of behaviour of Url is giving us a bit of a headache.
Also finding this new behavior very annoying. In pydantic v1 urls inherited from str, so it can be easily passed to library codes, where they usually expects string representation of urls. But now we need to str(model.url_field)
or model.url_field.unicode_string()
when passing urls, because now it is pydantic_core._pydantic_core.Url
I'm not sure which variant right (old or new url), but this little slash causes very strange and hours of uncatchable bug fixing.
FYI, I come up with this solution:
from typing import Annotated
from pydantic import AfterValidator, AnyUrl, BaseModel
URL = Annotated[AnyUrl, AfterValidator(lambda x: str(x).rstrip("/"))]
class Model(BaseModel):
url: URL
print(Model(url="https://google.com").url)
# > https://google.com
But sadly we cannot use this solution everywhere, because sometimes we need the last slash.
Seems, that v2 url validator adding slashes only to naked
urls, i.e. without path, which is acceptable:
from pydantic import AnyUrl, BaseModel
class Model(BaseModel):
url: AnyUrl
print(Model(url="https://google.com").url)
# > https://google.com/
print(Model(url="https://google.com/").url)
# > https://google.com/
print(Model(url="https://google.com/api").url)
# > https://google.com/api
print(Model(url="https://google.com/api/").url)
# > https://google.com/api/
Would be nice, if old behavior preserves, or like author suggested, we can choose behavior type. Thanks in advance!
+1 this is a bug.
In addition to a broken behavior when config URLs used to be used to append a path, like: https://google.com +"/search" (and all other variants with appending path in python)
There is also a violated principle of a least surprise when original value modified by a validation framework as when an URL is configured without slash is returned or saved in other components: DB, external services, they receive a new value which silently violated previous protocol until those actually start to execute it in a request.
Stumbled upon this, as well. It's not possible to take an instance of pydantic.AnyHttpUrl
and construct the same new instance of pydantic.AnyHttpUrl
using the attributes of the former. This worked perfectly fine in v1.
I consider this a breaking change. Can we please add it to the migration docs at least?
To my knowledge, the double slash will cause webservers to respond with a 404 as they expect a single slash as a path separator.
def test_v2():
from pydantic import TypeAdapter, AnyHttpUrl
redirect_uri = TypeAdapter(AnyHttpUrl).validate_python(
"http://localhost:3000/api/v1"
)
assert (
str(
AnyHttpUrl.build(
scheme=redirect_uri.scheme,
host=redirect_uri.host,
port=redirect_uri.port,
path=redirect_uri.path,
)
)
== "http://localhost:3000/api/v1"
)
def test_v1():
from pydantic.v1 import parse_obj_as, AnyHttpUrl
redirect_uri = parse_obj_as(AnyHttpUrl, "http://localhost:3000/api/v1")
assert (
str(
AnyHttpUrl.build(
scheme=redirect_uri.scheme,
host=redirect_uri.host,
port=redirect_uri.port,
path=redirect_uri.path,
)
)
== "http://localhost:3000/api/v1"
)
That's technically valid and works
@adriangb It might be technically valid. However, a URL containing a double slash is pointing to a different endpoint (not sure about the semantic) and thus I expect webservers to respond with a 404.
The use case I have in mind is sending a redirect URI to a client during an OAuth2 dance.
I had some validation checks because of this issue: the validator was testing for http://loinc.org
but I was getting http://loinc.org/
on the model_dump_json()
This is what I did:
from typing import Optional
from pydantic import BaseModel, AnyUrl, Extra, field_serializer
class Coding(BaseModel):
system: Optional[AnyUrl] = None
version: Optional[str] = None
code: Optional[str] = None
display: Optional[str] = None
userSelected: Optional[bool] = None
@field_serializer('system')
def serialize_system(self, system: AnyUrl):
return str(system).rstrip('/')
class Config:
extra = Extra.forbid
Another less-than-ideal workaround (if it's acceptable to strip all trailing slashes):
from typing import Annotated
from pydantic import AnyHttpUrl, AfterValidator, BaseModel, PlainValidator, TypeAdapter
AnyHttpUrlAdapter = TypeAdapter(AnyHttpUrl)
HttpUrlStr = Annotated[
str,
PlainValidator(lambda x: AnyHttpUrlAdapter.validate_strings(x)),
AfterValidator(lambda x: str(x).rstrip("/")),
]
class Model(BaseModel):
url: HttpUrlStr
url = Model(url="https://google.com").url
print(url)
# > https://google.com
This also solves another somewhat related issue: https://github.com/pydantic/pydantic/discussions/6395 In addition to the above points, IMO, when appending paths, this is more readable:
URL = f"{BASE_URL}/api/v1"
than this:
URL = f"{BASE_URL}api/v1"
I tried:
service_url: HttpUrl
@field_validator("service_url", mode="after")
@classmethod
def service_url_remove_trailing_slash(cls, service_url: HttpUrl) -> str:
return str(service_url).rstrip("/")
But then I got warnings:
/site-packages/pydantic/main.py:308: UserWarning: Pydantic serializer warnings:
Expected `url` but got `str` - serialized value may not be as expected
return self.__pydantic_serializer__.to_python(
How to keep the field as HttpUrl
without the trailing slash?
Edit:
HttpUrl
no longer supports +
operation with strings, e.g. foo.my_url + "/bar"
https://github.com/pydantic/pydantic/issues/8281
And if I workaround this by using custom type based on str
then I can no longer use properties like .host
on it.
Seems, that v2 url validator adding slashes only to
naked
urls, i.e. without path, which is acceptable:from pydantic import AnyUrl, BaseModel class Model(BaseModel): url: AnyUrl print(Model(url="https://google.com").url) # > https://google.com/ print(Model(url="https://google.com/").url) # > https://google.com/ print(Model(url="https://google.com/api").url) # > https://google.com/api print(Model(url="https://google.com/api/").url) # > https://google.com/api/
This inconsistency is my biggest issue, especially when trying to add on paths or sub-paths as others have mentioned. If you can't be certain if a url has a trailing slash or not out of the box then we almost need an os.path.join
style method that ensures exactly one forward slash between each path component.
For now I guess I'll just do this:
new_url = f"{str(URL).rstrip('/')}/path/here"
We got this problem when we migrated from pydantic 1 as well. In addition to that, a lot of code was broken since AnyUrl is not accepted as str any more. This is our solution if it will be helpful to anyone:
AnyUrlTypeAdapter = TypeAdapter(AnyUrl)
AnyUrlStr = Annotated[
str,
BeforeValidator(lambda value: AnyUrlTypeAdapter.validate_python(value) and value),
]
So we currently just use our "wrapper" AnyUrlStr everywhere for compatibility. It also keeps the original string without any modification if it passes the validation.
Wouldn't it be better to enhance the AnyUrl
type by giving it it's own implementation of __add__
providing a feature like urllib
's urljoin
?
It's definitely a BUG, not "documentation issue". Consider this example:
In [14]: url = AnyUrl.build(scheme='http', host='example.com', path='/')
In [15]: url = AnyUrl.build(scheme=url.scheme, host=url.host, path=url.path)
In [16]: url = AnyUrl.build(scheme=url.scheme, host=url.host, path=url.path)
In [17]: print(url)
http://example.com////
Thanks everyone for commenting.
I'd be happy to accept a PR to allow an opt-in change of behaviour. While i know many people here think the current behaviour is a bug, there will be others who are relying on it, so we can't change the default behavior in a minor release.
Here's my proposal for what needs to change:
- change this code to add another fiels to the
PyUrl
rust struct called something likeomit_trailing_slash: bool
, defaultfalse
only applies whenlib_url.path == "/"
- Based on that flag, change the behaviour of
__str__
,__repr__
,unicode_string
andpath
to omit the trailing slash if set - Same changes for
PyMultiHostUrl
- change the
url_schema()
core schema to acceptomit_trailing_slash
- same for
multi_host_url_schema
- we'll then need a PR to pydantic to add a config flag
url_omit_trailing_slash
(defaultFalse
) which changes how the core schema is built
If anyone's interested, feel free to submit a PR. Might be a nice start for anyone interested in getting their hands dirty with rust as (AFAIK) it shouldn't be a too complicated change, but also not trivial.
If no one is against it, I want to try!
@tworedz, go for it.
Wouldn't it be better to enhance the
AnyUrl
type by giving it it's own implementation of__add__
providing a feature likeurllib
'surljoin
?
@JensHeinrich, interesting idea - PR welcome.
@samuelcolvin that's the problem with this ticket - it puts the problem wrong.
The problem is not with trailing slashes, it's about leading path slash.
/api
and /api/
are different paths, pydantic handles them correctly, flag name omit_trailing_slash
would be confusing!
There are TWO problems:
-
Behavior with normalizing urls - adding slash to path if path is empty. From HTTP protocol point of view it makes perfect sense, (because you can't send a request to "" path) - but for other protocols may not. So option like "ensure_absolute_path" is possible, enabled by default for compatibility, and may be overridden, but I believe this is sane default anyway.
-
BUG with
Url.build
(why I created separate https://github.com/pydantic/pydantic/issues/8640 for that) - that adding LEADING slashes no matter if there WAS another leading slash already. Strictly saying, it doesn't relate to normalizing urls problem.
For what it's worth, the outcome of https://github.com/pydantic/pydantic-core/pull/1173 was that we think it's best to get the following upstream feature added for us to do Url.build
properly: https://github.com/servo/rust-url/issues/835
Based on the upvotes on that issue, anyone who is interested in taking that on would make a lot of people happy!
I keep running into this issue in a project I work on. The real problem isn't omission of slashes, it's the fact that hostnames get slashes added to them even if the user or developer did not specify them.
So
http://localhost:3000 -> http://localhost:3000/
But
http://localhost:3000/api -> http://localhost:3000/api
I feel like I've been on a discussion about this somewhere already, but I keep running into this issue. If we had a way to make it so that we could use the type validation features but not add a slash to hostname (like it was in v1) that would be fantastic.
@samuelcolvin would omit_trailing_slash: bool
then omit slashes in all cases, even if user specifies them? I don't think it should, because then you have the same problem as before, except now you're just omitting the slash instead of adding it.
also having this issue with CORS and need to work around