arrow
arrow copied to clipboard
GH-37929: [Python] begin moving static settings to pyproject.toml
Rationale for this change
To migrate Arrow to modern Python packaging standards, see PEP-517 and PEP-518.
- GitHub Issue: #37929
This PR focuses on migrating the static settings, the metadata and version, to pyproject.toml. Future PRs will migrate more of the build process to pyproject.toml.
:warning: GitHub issue #37929 has been automatically assigned in GitHub to PR creator.
I will try to investigate how other projects handle development versions with pyproject.toml!
I found this! https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#dynamic-metadata I'll get to work updating.
I am actively trying to figure out how to fix what's coming up in the build failures!
This is the current challenge for this PR.
To migrate the static metadata to the pyproject.toml, we need to set a version. In PyArrow, the version is set dynamically using setuptools_scm.
Setuptools_scm will only let you configure Callables in the setup.py. There are two callables we set, one for parse and one for version_scheme.
429 def parse_git(root, **kwargs):
430 """
431 Parse function for setuptools_scm that ignores tags for non-C++
432 subprojects, e.g. apache-arrow-js-XXX tags.
433 """
434 from setuptools_scm.git import parse
435 kwargs['describe_command'] =\
436 'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"'
437 return parse(root, **kwargs)
438
440 def guess_next_dev_version(version):
441 if version.exact:
442 return version.format_with('{tag}')
443 else:
444 def guess_next_version(tag_version):
445 return default_version.replace('-SNAPSHOT', '')
446 return version.format_next_version(guess_next_version)
447
As they currently are, we cannot configure these in the pyproject.toml, it will not accept a Python callable.
The next part of the challenge is that if you move the version metadata to pyproject.toml, none of the configurations in setup.py will be picked up. That is why the build is failing. So you cannot put the static variables in pyproject.toml, and pass the Python callables into setup.py via use_scm_version. It is an all-or-nothing migration.
I'm thinking that the next step is to contact the maintainers of setuptools_scm, and see if they have any advice.
The setuptools_scm docs have an example of passing a callable in setup.py with using pyproject.toml: https://setuptools-scm.readthedocs.io/en/latest/customizing/#providing-project-local-version-schemes So based on that it seems this should be possible?
Also, it seems that we use parse_git callable to use a custom git describe invocation. But, nowadays setuptools_scm also has an option to directly override which describe command is used (git_describe_command). So that part can probably be turned into a static configuration instead of a callable.
So based on that it seems this should be possible?
I found open issues for the behaviour I am noticing:
https://github.com/pypa/setuptools_scm/issues/827 https://github.com/pypa/setuptools_scm/issues/1011
Locally, the test seems to be behaving decently:
(base) ~/git/arrow/python kef/pyproject $ ls dist
pyarrow-16.0.0.dev453+g51a3831e4.d20240416.tar.gz
The question is understanding why it is failing in CI.
Note that we're not married to setuptools_scm. If we find out that something else would work better for us, then we can switch to it.
Found this comparison using a quick search: https://github.com/jwodder/versioningit/issues/46#issuecomment-1501201632
setuptools_scm does have logging. I'm going to see if the logging helps reveal anything. If nothing, I'll explore the alternatives!
From my investigation, it seems like the problem is that setuptools-scm is missing from the release VM:
/opt/hostedtoolcache/Python/3.12.3/x64/bin/python3: No module named setuptools_scm. It's not being installed. This is confusing, because it is under [build-system] in pyproject.toml. I'll look into that.
Okay, I managed to migrate all of the settings, and it is now working locally and on the CI! This is ready for review.
Note: I'm not quite sure what to do about the macos-latest runner for the "Source Release and Merge Script" build: Error: The current runner (macos--arm64) was detected as self-hosted because the platform does not match a GitHub-hosted runner image (or that image is deprecated and no longer supported).
Note: I'm not quite sure what to do about the macos-latest runner for the "Source Release and Merge Script" build:
I think this is due to macos-latest being moved to an arm64 runner which isn't accounted for in the scripts, ah yes here: #41371
So nothing to worry about for you!
@github-actions crossbow submit -g python
Revision: 40dd4533d6c45c44edd6efea58fa8507910f3983
Submitted crossbow builds: ursacomputing/crossbow @ actions-1cf142956d
The version is not correct here: https://github.com/ursacomputing/crossbow/actions/runs/8824626991/job/24227425713#step:3:4462 Looks like the env has to old a version of setuptools_scm? https://github.com/ursacomputing/crossbow/actions/runs/8824626991/job/24227425713#step:3:2940
@assignUser Yeah, this can happen if we install arrow using python3 setup.py sdist without having at least 8.0 of setuptools-scm. How do I update these environments?
@github-actions crossbow submit -g python
Revision: 7943b88ea08a57b8bb6737ff61de75813859ace6
Submitted crossbow builds: ursacomputing/crossbow @ actions-770bf60888
We are working on figuring out how to best update these environments. I want to research what is the recommend building method, now that Python is moving away from setup.py. If we need to stick with python setup.py sdist I'll need to find each environment that needs setuptools_scm installed.
We are working on figuring out how to best update these environments. I want to research what is the recommend building method, now that Python is moving away from setup.py. If we need to stick with
python setup.py sdistI'll need to find each environment that needs setuptools_scm installed.
In general we should move away from python setup.py sdist anyway because those kind of direct setup.py calls are deprecated, so this seems like a good reason to actually do so.
(see https://setuptools.pypa.io/en/latest/deprecated/commands.html and https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for context)
Specifically for building an sdist or wheel, the recommended way to do that nowadays is using the build package (python -m build --sdist for sdist, build.pypa.io)
(I know we still document to use python setup.py build_ext --inplace for local development, and I also use that myself, for this one there is a less clear alternative)
@github-actions crossbow submit example-python-minimal-build-fedora-conda
Revision: 7943b88ea08a57b8bb6737ff61de75813859ace6
Submitted crossbow builds: ursacomputing/crossbow @ actions-eb4456d5fc
| Task | Status |
|---|---|
| example-python-minimal-build-fedora-conda |
The failures on example-python-minimal-build-fedora-conda and example-python-minimal-build-ubuntu-venv are due to environmental issues. They are trying to find the tag for apache-arrow-16.0.0.dev on the fork and can't compute the version. Those jobs are slightly different than the others as we are using the docker-compose on the minimal examples on python: https://github.com/apache/arrow/tree/main/python/examples/minimal_build
we don't use archery and we don't use the SETUPTOOLS_SCM_PRETEND_VERSION variable.
If I execute the jobs locally with:
$ cd arrow/python/examples/minimal_build
$ docker-compose build
$ docker-compose run --rm minimal-ubuntu-venv
they are successful.
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==== 6115 passed, 1538 skipped, 7 xfailed, 2 warnings in 79.12s (0:01:19) =====
@anjakefala for the sake of investigation could you fetch the tags from the main arrow repo and push them to your fork and retry? This should fix the crossbow jobs. If that's the case we should open a follow up issue but this is slightly unrelated to your PR.
When I say, I execute the jobs locally, it is with the changes on your PR :)
(The force push was just to deal locally with a merge conflict. Any actual update has been committed.)
The failures you are seeing might be caused by https://github.com/apache/arrow/pull/41455, which started to do am "out of source" build
Specifically for building an sdist or wheel, the recommended way to do that nowadays is using the build package (python -m build --sdist for sdist, build.pypa.io)
Since this invocation requires the presence of an additional dependency (build), I've retracted my changes related to it for now. I wanted to make sure other things worked, before I added the complexity of making sure build was installed where it needed to be.
The current failures are related to the bump-versions tests I referred on my comment