Pathlib support
🚀 The feature
Add support for pathlib.Path in addition to str for any functions taking files or directories as input.
According to PEP-519, the type hints would look like:
Union[str, bytes, os.PathLike[str], os.PathLike[bytes]]
Motivation, pitch
TorchGeo is planning on adding support for this. However, we use a lot of utility functions like download_url that only accept str at the moment.
This was also requested in #7721 for a more limited scope, so users are clearly trying to use this.
It seems that most of the code in torchvision/prototype already uses pathlib? Maybe it's just a matter of waiting until those prototypes become the default. But I don't see any progress yet for porting the utility functions we use.
Alternatives
Our workaround for now is to convert all Paths to strings before calling download_url, but that kind of defeats the purpose of supporting Paths if we always immediately convert them to strings.
Additional context
Cross linking to https://github.com/microsoft/torchgeo/issues/1616 and https://github.com/microsoft/torchgeo/pull/1731
@pioneerHitesh
P.S. We would be willing to contribute this for the utilities we use in TorchGeo, but probably don't have the time/energy to contribute this for all of torchvision. Let me know what you think.
cc @pmeier
Thanks for the feature request @adamjstewart . This sounds reasonable, we could add support for pathlib.Path where it matters.
we use a lot of utility functions like
download_url
Just a heads up on this: while our download utils are effectively public, we really don't recommend that you rely on them in your library: those were introduced a long time ago as public but they should have been private from the beginning. We might deprecate and make them private in the future, so please migrate away from them if you can.
I acknowledge that there's a lack of canonical download utils in pytorch in general, and this has led to each domain library implementing their own solution for this... But torchvision isn't the place for such a util and our stuff shouldn't be considered a reference. (There are related discussions in https://github.com/pytorch/pytorch/issues/68320#issuecomment-1485651801).
Yeah, I liked the idea behind a shared resource like torchdata for download/extraction, but it seems like torchdata is dead. We might just add a new dependency on a non-torch library if we need it. We need support for a lot of weird extraction utilities like RAR and zip deflate so I'd like to avoid maintaining that code.
Let me poke around and see what I can find. If there isn't anything, maybe I'll make my own library.
I found:
Still looking for a good checksum tool. But it's a shame to have to add 2–3 new deps that I also need to track.
Hi~ New to wanting to contribute to pytorch, I was working on my project and found myself wanting this feature too and found this issue. Seems relatively straightforward. If so inclined, @adamjstewart do you mind sharing what you had in mind after your Nov 2023 comment? Maybe I can take a stab at it if this isn't being actively worked on.
@NicolasHug is this already implemented by #8196 and #8200?
oh fancy. so i'll see it soon in new versions :D should we close this one? -edit: ok so this is at least in download utils, guess the next is seeing if we can push this into other functions that may use paths
FWIW, I personally only care about the download utils. So happy to close this if it's complete, or happy to leave it open if @NicolasHug wants to track pathlib adoption for the entire repo.
@mlkimmins thank you for your interest.
The main places where we still need to add Pathlib support are:
- the public io stuff in particular
read_file,read_image,write_*. You can leave out the video stuff though. - All of the datasets'
rootparameters.
Feel free to submit a PR for any of these! Preferably in small-ish chunks of 3-4 APIs at a time, to make it easier to review. Thank you!
hi @mlkimmins , I'm curious if you had a chance to submit a PR for this yet? If not, I might have to take care of it myself if you don't mind: our branch cut is on Monday and we'd need to address this before then so that it can make it into the next release
Hi @NicolasHug ! I didn't see the previous message 2 wks ago, so no code, so you should go ahead and do it!
Closed by https://github.com/pytorch/vision/pull/8314 and https://github.com/pytorch/vision/issues/8120, thanks for opening the issue @adamjstewart