marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

pathlib.Path support

Open mivade opened this issue 4 years ago • 8 comments
trafficstars

It would be nice if there were a Marshmallow field for (de)serializing pathlib.Path objects since these are often more convenient to work with than raw strings and os.path functions. I typically use Marshmallow via marshmallow_dataclass and can work around this with hooks but having builtin support for Paths would be a great addition.

mivade avatar May 26 '21 19:05 mivade

@mivade, hello! I want to work on this, but im true don't know what you want) Can you get me some examples (or other materials), please? P.S. Im want only see what i can change

Yourun-proger avatar Aug 04 '21 19:08 Yourun-proger

I suspect paths might be too obscure of a type to include fields for it in marshmallow's core. I would recommend publishing any packages you create on pypi then making it discoverable by adding it to the ecosystem page.

https://github.com/marshmallow-code/marshmallow/wiki/Ecosystem

deckar01 avatar Aug 04 '21 19:08 deckar01

@deckar01 pathlib has been around since Python 3.4 and I'd hardly call it "too obscure." There are certainly many caveats to doing this right (e.g., the differences between the various concrete path classes) but I don't really see any reason why it couldn't be incorporated into Marshmallow core at this stage.

mivade avatar Aug 04 '21 20:08 mivade

I'm not saying pathlib is obscure, rather that using a system path as a schema field is. Custom field types should be convenient enough to allow users to wrap obscure types as needed.

https://marshmallow.readthedocs.io/en/stable/custom_fields.html

from marshmallow import fields, ValidationError
import pathlib


class Path(fields.Field):
    def _serialize(self, value, attr, obj, **kwargs):
        if value is None:
            return ""
        return str(value)

    def _deserialize(self, value, attr, data, **kwargs):
        return pathlib.Path(value)

deckar01 avatar Aug 04 '21 20:08 deckar01

Could you please explain a little more about what you mean by "using a system path as a schema field [is obscure]?" To give you some more insight into why I think this would be a useful feature, I'm using Marshmallow for more than just defining JSON schemas over HTTP (where a filesystem path is not very likely to make much sense). I also use it to load configuration files or validate CSV data, both of which often contain filesystem paths.

I understand how to write custom fields or the various other ways of making this work (Function is a field I didn't know about until recently which is yet another alternative) and that's fine for now but I tend to prefer not having to pull in yet another dependency (or worse, copy-pasting the same dozen or so lines everywhere I need it) for functionality that could easily be included in Marshmallow itself. I'm happy to open a PR if there is a willingness to consider it.

Thank you for this thoughtful discussion!

mivade avatar Aug 05 '21 14:08 mivade

I dug into the issues to see if I could find any similar requests for new fields and ran across https://github.com/marshmallow-code/marshmallow/pull/1735. It is also a part of the standard library (that I suspect the average user wouldn't need in a schema), but we added support for it. I suspect someone will be open to reviewing it if you made a PR. Feel free to flesh out the API here. What would the is_dir/file/mount/... API look like as field validation?

deckar01 avatar Aug 05 '21 20:08 deckar01

What would the is_dir/file/mount/... API look like as field validation?

That's really the question to me and probably what makes this a bit tricky. I could see something like this:

def __init__(
    self,
    exists: Optional[bool] = None,
    is_dir: Optional[bool] = None,
    is_mount: Optional[bool] = None,
    is_file: Optional[bool] = None,
    ...
) -> None:
    ....

but this is a bit awkward since it wouldn't make sense to have both is_dir and is_file set to True. This could of course be checked for but it could lead to some odd semantics. I wonder if it might make more sense to define validators that get passed in with the validate kwarg? I do think we might want some convenience boolean options such as resolve, expanduser, etc. which would call Path.resolve, Path.expanduser. Similarly it might be good to have a path_class argument that takes in the type of path class to use.

mivade avatar Aug 06 '21 16:08 mivade

I would recommend separate field types for validating the mutually exclusive attributes.

class Path(Field)
class Directory(Path)
class File(Path)

Using constructor args for optional validation is convenient, but providing validator classes avoids complex method signatures, if chains, and storing them all as instance properties. Much less code to maintain and more performant. You could minimize the API surface area by making a pass through validator for the boolean checks.

class Relative(Validator)
class Path(Validator):
    ...
    def __call__(self, path):
        for name, expected in self.checks:
            actual = getattr(path, name)()
            if  actual != expected:
                raise ValidationError(...)
Path(validate=[validate.Path(is_absolute=True), validate.Relative('/var/www/')])

I'm not sure if fields for the "pure" paths would need to be exposed also. pathlib doesn't seem to offer any validation when the path isn't on the current file system.

deckar01 avatar Aug 06 '21 19:08 deckar01