flask-reuploaded icon indicating copy to clipboard operation
flask-reuploaded copied to clipboard

Replace os.path with pathlib

Open eumiro opened this issue 4 years ago • 11 comments

This change is not complete because the TestConflictResolution monkeypatches the os.path.exists and os.makedirs, so I was forced to use these methods instead of Path.exists and Path.mkdir. Other than that, the directory structure should be a little bit clearer in the code.

Let's discuss the changes here, update the tests (maybe do something relative within tmp_path?). I make this PR draft to prevent accidental merge, but let's review, please.

eumiro avatar Jan 18 '21 17:01 eumiro

Thanks for your PR!

Meanwhile I increased the coverage to 100% - if you could rebase your PR, then there can't be any more coverage glitches.

I will need some time to review the PR, as it uses quite some idioms which I do not use regularly.

jugmac00 avatar Jan 19 '21 12:01 jugmac00

Thanks for the contribution and the patience as it took so long for me to write a review.

I looked at the diff many, many times, and I still feel the old version is more readable. This may be because I am not used to Path a lot.

As you correctly stated, due to the test suite, currently you have to convert the Path object to str quite often, so the real power of Path does not shine.

For me, there is one compelling reason to switch to Path, when the user base of Flask-Reuploaded wants it. Then there is the time to additionally accept Path objects in the configuration and also enable handling them in the backend.

jugmac00 avatar Apr 07 '21 17:04 jugmac00

For me, there is one compelling reason to switch to Path, when the user base of Flask-Reuploaded wants it.

I'll put my hand up for this. Pathlib is significantly better than the alternative. The ecosystem has been converging on it for some time now. Even pytest recently dropped support for py.path in favour of pathlib.

djmattyg007 avatar Apr 16 '22 00:04 djmattyg007

Hey @djmattyg007,

Thank you for your feedback!

I'll put my hand up for this. Pathlib is significantly better than the alternative. The ecosystem has been converging on it for some time now. Even pytest recently dropped support for py.path in favour of pathlib.

Could you elaborate a bit where you personally would benefit from flask-reuploaded using pathlib?

@eumiro's pull request currently only covers the internals of the library, with no user facing changes.

When we think of user facing changes, we'd need to decide whether we only accept Path-like objects, or both paths / strings and Path-like objects.

I personally think breaking backwards compatibility with thousands of installations is a no-go.

So, this would leave us with a more complicated API, ie accepting str + Path-like objects.

Let's have a look at the example application from the README:

import os
from flask import Flask, flash, render_template, request
from flask_uploads import IMAGES, UploadSet, configure_uploads

app = Flask(__name__)
photos = UploadSet("photos", IMAGES)
app.config["UPLOADED_PHOTOS_DEST"] = "static/img"

This would turn into...

import os
from pathlib import Path
from flask import Flask, flash, render_template, request
from flask_uploads import IMAGES, UploadSet, configure_uploads

app = Flask(__name__)
photos = UploadSet("photos", IMAGES)
app.config["UPLOADED_PHOTOS_DEST"] = Path("static/img")

Actually, first I would even need to think whether this is really a Path or rather a PurePath.

Anyway... This is the only change for the user that I see - the user now needs to import an additional library. I do see not a single benefit.

So, that's why I am happy you commented here. I hope you can offer some insight.

This is the user-facing side. Let's have a look at the maintainer side.

As I mentioned I cannot drop str support for paths, or I would break thousands of user's applications.

So I need to maintain an API which is twice as complicated as the current one. Additionally I would need to rewrite a bigger part of the test suite, and create a couple of more tests.

Rewrites cost a lot of (spare) time and there is the danger of introducing new bugs in a otherwise rock solid library.

I have a close look at the Python eco-system, so no worries about the future of this library.

jugmac00 avatar Apr 16 '22 08:04 jugmac00

I fully understand your concerns, and I feel that Python offers a pretty simple solution to this problem. Instead of depending on pathlib.Path, you can depend on Union[str, os.PathLike]. Then wherever you accept user input, you can just convert it directly to a Path object like so:

import os
from pathlib import Path
from typing import Union

def function_called_by_user(mypath: Union[str, os.PathLike]):
    mypath = Path(mypath)
    # do stuff

This is by far the easiest way of providing a backwards-compatible interface that also accepts the widest range of input types possible, while also letting you immediately adopt pathlib.

If you're fetching values out of app.config, I guess you'd do something like this:

def another_function():
    dest = Path(app.config["UPLOADED_PHOTOS_DEST"])

djmattyg007 avatar Apr 17 '22 01:04 djmattyg007

Thanks @djmattyg007.

Let's put the technical implementation aside for a moment.

I still see neither an advantage for the users (they additionally need to import pathlib.Path), nor an advantage for myself, as I stated the current state is rock stable, and no work is much better than some work (+ the risk to introduce new bugs).

Which advantages are there for the users?

Additionally you may or may not be aware that pathlib.Path is considerably slower than os.path, see e.g. https://www.youtube.com/watch?v=tFrh9hKMS6Y - not that it would matter a lot for this code base.

jugmac00 avatar Apr 17 '22 07:04 jugmac00

they additionally need to import pathlib.Path

The fundamental reason why users want pathlib support in libraries such as these is because they are already importing pathlib and using it in the rest of their application. The burden comes from libraries such as these that don’t support pathlib objects, requiring us to manually cast Path objects to a string before passing them in.

djmattyg007 avatar Apr 18 '22 00:04 djmattyg007

Which advantages are there for the users?

The advantage is therefore less cognitive overhead when writing my own code, as I no longer have to think about what does and doesn’t support this core feature that’s been in the Python stdlib for a very long time.

djmattyg007 avatar Apr 18 '22 00:04 djmattyg007

Thanks!

Sounds reasonable.

I see two ways forward:

  • either just update the API, so the users can provide either a Path object or a path as a str
  • completely update the internals and the test suite, but still accept both Path objects and str as user input

I do not have the time to do this myself (building a house for my family), but I would certainly review a PR for either of the mentioned ways.

jugmac00 avatar May 10 '22 08:05 jugmac00

I've started looking into this. It turns out that (in Werkzeug 2 at least) the TestingFileStorage class already supports pathlib because Werkzeug supports it itself. The type hints just aren't accurate upstream.

djmattyg007 avatar May 14 '22 01:05 djmattyg007

I submitted a related PR to Werkzeug: https://github.com/pallets/werkzeug/pull/2418

djmattyg007 avatar May 14 '22 02:05 djmattyg007

@djmattyg007 Ok, I just checked and your upstream changes have been merged already.

If you or @eumiro would like to continue working on this PR, I am happy to accept it.

Though, please wait until https://github.com/jugmac00/flask-reuploaded/pull/135 is merged, so also the documentation could be updated then.

jugmac00 avatar Oct 01 '22 09:10 jugmac00