Switch from setup.cfg + setup.py to pyproject.toml only
As discussed today during the sprint, here is the PR to move to using pyproject.toml only. This can be done with ease because you do not need to compile any C-extension or anything. You can try out installing skrub configured with pyproject.toml only by running the following command in a fresh new env:
pip install -e "git+https://github.com/baggiponte/skrub@switch-to-pyproject#egg=skrub[dev]"
(if you use @... you can tell git to install from any commit/tag/branch; the #egg=skrub tells pip where the package is so that it can build the wheel; the [dev] at the end installs the package and dev dependencies. this will raise a deprecation warning: do not worry, it's just because you are installing from a branch in my fork, because I did not want to push the changes that I made to main)
The fact that the package is installed means that wheels can be built, i.e. the package can be published and distributed.
I simply ported everything inside setup.cfg following the official pyproject specification here as well as how to configure setuptools with pyproject.
I changed flake8 to ruff because flake8 cannot be configured with pyproject; ruff is a drop in replacement written in rust, very fast, extremely widely adopted (pandas, scipy, apache airflow, and many many more all use it). If you don't like the new tool, we can just leave the flake8 section in setup.cfg or in a .flake8 file.
I also edited the precommit config accordingly. I realised that while you configured codespell, it was not running in precommit, so I added that too. precommit runs fine; please do tell me of other ways to test whether something is broken! 😃
Hey @baggiponte, thanks for this PR! I agree that moving to pyproject.toml is an improvement, since we don't have compiled files, nor intend to have some.
I have no opinion on whether we should use ruff or not, despite its popularity.
However, for the sake of passing the CI, you might want to fix the ruff errors, since many flake8 ignored rules are not mirrored. See the error in the CI. Here is the list of ruff rules.
Using pyproject.toml with setuptools as its backend, it's good to mention that we can still generate bdist_wheel and sdist using either wheel:
pip wheel . --no-deps -w my_dist
or build (might need to be pip install):
python -m build
What do the other maintainers think? cc @GaelVaroquaux who might be interested in this.
Hi Vincent!
Thank you for your review. I would have gotten it done this weekend but I was busy. I shall get back to it tomorrow and fix it asap 💪🏼
Don't worry, there's no rush :) Thanks for staying motivated!
Hey @baggiponte, do you need help on this? :)
One thing worth considering is that when using setuptools, to include package data (non-python files that need to be shipped with the code) ATM with setuptools if we don't use setup.py, according to the documentation the 2 options are (i) list the data files in a MANIFEST.in (a bit tedious and there is the risk of forgetting some) or (ii) use the setuptools-scm plugin to include files that are tracked by git.
Ciao @Vincent-Maladiere, sorry if it took me so long to get back to this.
- I fetched the latest updates and merged in my working branch
- Removed the E24 altogether. My understanding is that if you say E24 flake8 will understand "every rule that starts with E24", which comprises E241 and E242, which denote multiple spaces and tabs after commas. This was ignored, I guess because black solves this beforehand.
UPDATE 1 Okay! Now there are no more errors related to my configuration. The error raised at this line is for a ruff rule that is not respected; the other errors below are from codespell: please tell me how I should proceed here :)
CircleCI fails because it expects setup.py to be there:
python: can't open file '/home/circleci/project/setup.py': [Errno 2] No such file or directory Exited with code exit status 2
I shall check the build_docs.sh script next!
UPDATE 2
Now this error is raised in CircleCI. My understanding is that is due to something else with sphinx, but I can't really figure out why. Does it normally work? Do you understand what might be going on? Error is here.
Extension error (sphinx.ext.autosummary): Handler <function process_generate_options at 0x7f71f98e0280> for event 'builder-inited' threw an exception (exception: no module named skrub.datasets)
Ciao @baggiponte! @jeromedockes raises a crucial point regarding package data (non-python files) and setup.py. We need to address this point before moving forward with pyproject.toml!
Switching packaging tools can take time since there are a lot of things to consider. In the meantime, feel free to tackle another issue :)
Ciao @baggiponte! @jeromedockes raises a crucial point regarding package data (non-python files) and setup.py. We need to address this point before moving forward with pyproject.toml!
Switching packaging tools can take time since there are a lot of things to consider. In the meantime, feel free to tackle another issue :)
There is a nice cookie-cutter based on pyproject.toml which could help as a guide if needed :) https://github.com/woltapp/wolt-python-package-cookiecutter/tree/master
Ciao @baggiponte! @jeromedockes raises a crucial point regarding package data (non-python files) and setup.py. We need to address this point before moving forward with pyproject.toml!
Ciao @Vincent-Maladiere, thanks for the review. I saw @jeromedockes comment indeed, but did not talk about it as I wanted to wait for your reply before opening a new thread.
This makes a lot of sense. IIUC, there's only one non-Python file to include, which is VERSION.txt. Am I wrong? If that were the only one, we can simply write the version to __version__ instead of reading the text file here.
Currently, VERSION.txt is read in the sphinx conf.py but it would suffice to add an import statement import skrub and a call to skrub.__version__ - e.g., on line 117 simply calling release = skrub.__version__.
I don't mean to push my idea, just trying to write out all possible alternatives. Up to you to decide the road to take :) I'll gladly look at the other issues.
Hi, yes ATM there is only the VERSION.txt; a few more are likely to appear at some point (eg tiny example datasets).
Such files are perfectly manageable without a setup.py; I just meant at some point we may need to choose between using a MANIFEST.in, using setuptools-scm, or using a different build backend than setuptools (all of which are easy to do).
It is probably better to keep the version in a txt file rather than a Python module because it is needed in the package metadata during the build & installation of skrub, when skrub cannot yet be imported. It's not a problem because in practice setuptools copies it to the build directory. And we could always add a small MANIFEST.in for now just to be safe.
Hi @baggiponte, after discussing with other skrub members, we think it's a good idea to go forward with the pyproject.toml! We're fine with ruff and having a small MANIFEST.in :)
Great! Do let me know how can I be of help to finish implementing this. I guess I should:
- Merge master and update
- Write the small MANIFEST.in
Something I forgot?
Finish the migration from flake8 to ruff, or is it done already?
Let me double check but I recall it was done.