feat: pydantic v1 serialization and validation
Adds pydantic v1 support through validators like __get_validators__ and a custom .to_dict() method expected to be used alongside Config.json_encoders in pydantic v1 style BaseModel's.
I altered the testing structure for pydantic somewhat to ensure that both v1 / v2 'style' validation is tested using a parametrisation framework, and capped the pydantic version (for tests only) to <3 such that the pydantic v1 style validation should be tested against as well.
The main downside to capping the pydantic version for tests is if pydantic makes api changes in subsequent versions that do not accept the v1 shim then this testing framework would have to be split out to ensure coverage (in an environment with >3 and another with <3 tested which would be a pain, but this should have no effect on actual usage of the package since no pydantic code is explicitly used in the UPath still.
Issue link where this is discussed: https://github.com/fsspec/universal_pathlib/issues/397
Let me know if there is anything else you'd like me to add to this - thanks!
@ap-- would it be possible to approve the workflow here? Was going to mark this as ready once all the linting / tests pass correctly.
Hi @rx-dwoodward, could you comment why this couldn't just implement
__get_validators__?What's the limitation in v1 here?
so __get_validators__ is supposed to yield a number of validation functions, so I need to include the classmethod to provide the actual validation that gets run. See here.
so get_validators is supposed to yield a number of validation functions, so I need to include the classmethod to provide the actual validation that gets run. See here.
Well their example yields a classmethod, but i don't see a requirement for this to be a classmethod. The requirement is that you yield a callable, that adheres to the Validator interface. There seems to be note that these are classmethods https://docs.pydantic.dev/1.10/usage/validators/, but it's unclear to me if this is actually a requirement. You could still just bind a helper function to the class if needed before yielding it.
Yeah you are right it doesn't have to be a classmethod, I just figured its nicer to bind this validation as a method to the object rather than a standalone function outside of this. Typically its a classmethod such that cls(**v) or equivalent is used so that the output of validation returns an instance of the class - which ofc would be fine in this case straight up using UPath(...) because UPath dispatches to the correct type under the hood anyway.
So yes it does not have to be a classmethod if you'd prefer, I just figured it was cleaner to have this attached to the class (as is normally the case) compared to a separate function outside of the class.
So it sounds to me as if:
def somehow_serialize_upath_to_dict(pth: UPath) -> UPathTypedDict: ...
class Model(BaseModel):
path: UPath
class Config:
json_encoders = {UPath: somehow_serialize_upath_to_dict}
Would be a recipe we can't get around for pydantic v1 support. That is because unlike with v2, there is no way to extend a class that does not inherit from BaseModel so that it provides its (de-)serialization methods to pydantic.
Is there a v1 way to make UPath the root model? so that there would be a UPathV1Model class, that has the necessary pydantic methods?
Yeah thats exactly right - its a particular annoyance of pydantic v1 to be honest.
I mean you could do something like:
from pydantic.v1 import BaseModel as V1BaseModel, PrivateAttr
class UPathModel(V1BaseModel):
path: str
protocol: str = ""
storage_options: dict[str, Any] | None = None
_upath: UPath = PrivateAttr(None)
class Config:
orm_mode: bool = True
# required when pydantic models have properties to serialize correctly.
keep_untouched = (property,)
def __init__(self, **data) -> None:
super().__init__(**data)
self._upath = UPath(self.path, protocol=self.protocol, **self.storage_options)
# enables construction from a single string rather than a dictionary with the field keys
@classmethod
def from_orm(cls, v: Any) -> "UPathModel":
upath = UPath(v)
return cls(path=upath.path, protocol=upath.protocol, storage_options=upath.storage_options)
# provides 'public' access to the UPath created.
@property
def upath(self) -> "UPath":
return self._upath
# consumer of the UPath / UPathModel
class Model(V1BaseModel):
path: UPathModel
# construction of `UPathModel` from a string using `from_orm` under the hood
model = Model(path="s3://bucket/path/to/key")
type(model.path) # == UPathModel
type(model.path.upath) # == S3Path
model.json() # == '{"path": {"path": "bucket/path/to/key", "protocol": "s3", "storage_options": {}}}'
Now this handles the jsonifying - because it jsonifies the model not the UPath, but it means you have to access the actual UPath via model.path.upath instead of just model.path.
We could change the UPathModel validation to end up returning the UPath instead of the UPathModel - but this then would get us back to where we started with json serializing the UPath failing and needing to incorporate a json_encoders config argument.
So imo each solution is inelegant, but directly using the UPath as a field parameter is imo nicer than having to swap the access of the path directly. Ofc you could mess around more and have __getattr__ on the UPathModel refer to calling getattr(self._upath) but that adds so much complexity imo compared to just asking v1 users to add the json_encoders for UPath with the convenience UPath.to_dict method.
LMK what you think though (and apologies for having to drop these annoying nuances of pydantic v1 on you 😓)
So it sounds to me as if:
def somehow_serialize_upath_to_dict(pth: UPath) -> UPathTypedDict: ... class Model(BaseModel): path: UPath class Config: json_encoders = {UPath: somehow_serialize_upath_to_dict}Would be a recipe we can't get around for pydantic v1 support. That is because unlike with v2, there is no way to extend a class that does not inherit from BaseModel so that it provides its (de-)serialization methods to pydantic.
Yeah this is exactly the case. Hence I thought it would be nicer to have the serialization as a method on the UPath than import it from some other location even if the function name is clear. But lmk if you would prefer a different solution!
xref: pydantic v2/v1 download statitistics: https://github.com/pydantic/pydantic/issues/11613#issuecomment-3242513660
Hi @rx-dwoodward
I spent a bit of time thinking about this, and I'd be happy to integrate basic functionality for serialization into universal-pathlib, and provide a recipe in the README how to use it to set up a minimal pydantic v1 model.
If you're still interested, could you change this PR, and add a upath.serializers module with a upath_to_dict and upath_from_dict function, that allow going from/to UPath to/from SerializedUPath. Then I think the only addition needed in upath.core for more convenient v1 support is the __get_validators__ class method, that would yield upath_from_dict and UPath itself.
The PR could then add a small section to the readme for pydantic support, and provide a recipe for how to use it with v1.