arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-37929: [Python] begin moving static settings to pyproject.toml

Open anjakefala opened this issue 1 year ago • 28 comments

Rationale for this change

To migrate Arrow to modern Python packaging standards, see PEP-517 and PEP-518.

  • GitHub Issue: #37929

This PR focuses on migrating the static settings, the metadata and version, to pyproject.toml. Future PRs will migrate more of the build process to pyproject.toml.

anjakefala avatar Apr 05 '24 17:04 anjakefala

:warning: GitHub issue #37929 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Apr 05 '24 17:04 github-actions[bot]

I will try to investigate how other projects handle development versions with pyproject.toml!

anjakefala avatar Apr 05 '24 18:04 anjakefala

I found this! https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#dynamic-metadata I'll get to work updating.

anjakefala avatar Apr 05 '24 19:04 anjakefala

I am actively trying to figure out how to fix what's coming up in the build failures!

anjakefala avatar Apr 08 '24 18:04 anjakefala

This is the current challenge for this PR.

To migrate the static metadata to the pyproject.toml, we need to set a version. In PyArrow, the version is set dynamically using setuptools_scm.

Setuptools_scm will only let you configure Callables in the setup.py. There are two callables we set, one for parse and one for version_scheme.

 429 def parse_git(root, **kwargs):                                                                                                                                                                                                
 430     """                                                                                                                                                                                                                       
 431     Parse function for setuptools_scm that ignores tags for non-C++                                                                                                                                                           
 432     subprojects, e.g. apache-arrow-js-XXX tags.                                                                                                                                                                               
 433     """                                                                                                                                                                                                                       
 434     from setuptools_scm.git import parse                                                                                                                                                                                      
 435     kwargs['describe_command'] =\                                                                                                                                                                                             
 436         'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"'                                                                                                                                                  
 437     return parse(root, **kwargs)                                                                                                                                                                                              
 438                                          
 440 def guess_next_dev_version(version):                                                                                                                                                                                          
 441     if version.exact:                                                                                                                                                                                                         
 442         return version.format_with('{tag}')                                                                                                                                                                                   
 443     else:                                                                                                                                                                                                                     
 444         def guess_next_version(tag_version):                                                                                                                                                                                  
 445             return default_version.replace('-SNAPSHOT', '')                                                                                                                                                                   
 446         return version.format_next_version(guess_next_version)                                                                                                                                                                
 447                                                                

As they currently are, we cannot configure these in the pyproject.toml, it will not accept a Python callable.

The next part of the challenge is that if you move the version metadata to pyproject.toml, none of the configurations in setup.py will be picked up. That is why the build is failing. So you cannot put the static variables in pyproject.toml, and pass the Python callables into setup.py via use_scm_version. It is an all-or-nothing migration.

I'm thinking that the next step is to contact the maintainers of setuptools_scm, and see if they have any advice.

anjakefala avatar Apr 11 '24 22:04 anjakefala

The setuptools_scm docs have an example of passing a callable in setup.py with using pyproject.toml: https://setuptools-scm.readthedocs.io/en/latest/customizing/#providing-project-local-version-schemes So based on that it seems this should be possible?

jorisvandenbossche avatar Apr 12 '24 15:04 jorisvandenbossche

Also, it seems that we use parse_git callable to use a custom git describe invocation. But, nowadays setuptools_scm also has an option to directly override which describe command is used (git_describe_command). So that part can probably be turned into a static configuration instead of a callable.

jorisvandenbossche avatar Apr 12 '24 15:04 jorisvandenbossche

So based on that it seems this should be possible?

I found open issues for the behaviour I am noticing:

https://github.com/pypa/setuptools_scm/issues/827 https://github.com/pypa/setuptools_scm/issues/1011

anjakefala avatar Apr 16 '24 17:04 anjakefala

Locally, the test seems to be behaving decently:

(base) ~/git/arrow/python kef/pyproject $ ls dist 
pyarrow-16.0.0.dev453+g51a3831e4.d20240416.tar.gz

The question is understanding why it is failing in CI.

anjakefala avatar Apr 16 '24 20:04 anjakefala

Note that we're not married to setuptools_scm. If we find out that something else would work better for us, then we can switch to it.

Found this comparison using a quick search: https://github.com/jwodder/versioningit/issues/46#issuecomment-1501201632

pitrou avatar Apr 17 '24 14:04 pitrou

setuptools_scm does have logging. I'm going to see if the logging helps reveal anything. If nothing, I'll explore the alternatives!

anjakefala avatar Apr 24 '24 17:04 anjakefala

From my investigation, it seems like the problem is that setuptools-scm is missing from the release VM:

/opt/hostedtoolcache/Python/3.12.3/x64/bin/python3: No module named setuptools_scm. It's not being installed. This is confusing, because it is under [build-system] in pyproject.toml. I'll look into that.

anjakefala avatar Apr 24 '24 21:04 anjakefala

Okay, I managed to migrate all of the settings, and it is now working locally and on the CI! This is ready for review.

Note: I'm not quite sure what to do about the macos-latest runner for the "Source Release and Merge Script" build: Error: The current runner (macos--arm64) was detected as self-hosted because the platform does not match a GitHub-hosted runner image (or that image is deprecated and no longer supported).

anjakefala avatar Apr 24 '24 22:04 anjakefala

Note: I'm not quite sure what to do about the macos-latest runner for the "Source Release and Merge Script" build:

I think this is due to macos-latest being moved to an arm64 runner which isn't accounted for in the scripts, ah yes here: #41371 So nothing to worry about for you!

assignUser avatar Apr 24 '24 23:04 assignUser

@github-actions crossbow submit -g python

assignUser avatar Apr 24 '24 23:04 assignUser

Revision: 40dd4533d6c45c44edd6efea58fa8507910f3983

Submitted crossbow builds: ursacomputing/crossbow @ actions-1cf142956d

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-12-python-3-amd64 Azure
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 GitHub Actions

github-actions[bot] avatar Apr 24 '24 23:04 github-actions[bot]

The version is not correct here: https://github.com/ursacomputing/crossbow/actions/runs/8824626991/job/24227425713#step:3:4462 Looks like the env has to old a version of setuptools_scm? https://github.com/ursacomputing/crossbow/actions/runs/8824626991/job/24227425713#step:3:2940

assignUser avatar Apr 25 '24 16:04 assignUser

@assignUser Yeah, this can happen if we install arrow using python3 setup.py sdist without having at least 8.0 of setuptools-scm. How do I update these environments?

anjakefala avatar Apr 25 '24 17:04 anjakefala

@github-actions crossbow submit -g python

anjakefala avatar Apr 25 '24 17:04 anjakefala

Revision: 7943b88ea08a57b8bb6737ff61de75813859ace6

Submitted crossbow builds: ursacomputing/crossbow @ actions-770bf60888

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-12-python-3-amd64 Azure
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 GitHub Actions

github-actions[bot] avatar Apr 25 '24 17:04 github-actions[bot]

We are working on figuring out how to best update these environments. I want to research what is the recommend building method, now that Python is moving away from setup.py. If we need to stick with python setup.py sdist I'll need to find each environment that needs setuptools_scm installed.

anjakefala avatar Apr 29 '24 16:04 anjakefala

We are working on figuring out how to best update these environments. I want to research what is the recommend building method, now that Python is moving away from setup.py. If we need to stick with python setup.py sdist I'll need to find each environment that needs setuptools_scm installed.

In general we should move away from python setup.py sdist anyway because those kind of direct setup.py calls are deprecated, so this seems like a good reason to actually do so. (see https://setuptools.pypa.io/en/latest/deprecated/commands.html and https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for context)

Specifically for building an sdist or wheel, the recommended way to do that nowadays is using the build package (python -m build --sdist for sdist, build.pypa.io)

(I know we still document to use python setup.py build_ext --inplace for local development, and I also use that myself, for this one there is a less clear alternative)

jorisvandenbossche avatar May 01 '24 10:05 jorisvandenbossche

@github-actions crossbow submit example-python-minimal-build-fedora-conda

raulcd avatar May 03 '24 10:05 raulcd

Revision: 7943b88ea08a57b8bb6737ff61de75813859ace6

Submitted crossbow builds: ursacomputing/crossbow @ actions-eb4456d5fc

Task Status
example-python-minimal-build-fedora-conda GitHub Actions

github-actions[bot] avatar May 03 '24 10:05 github-actions[bot]

The failures on example-python-minimal-build-fedora-conda and example-python-minimal-build-ubuntu-venv are due to environmental issues. They are trying to find the tag for apache-arrow-16.0.0.dev on the fork and can't compute the version. Those jobs are slightly different than the others as we are using the docker-compose on the minimal examples on python: https://github.com/apache/arrow/tree/main/python/examples/minimal_build we don't use archery and we don't use the SETUPTOOLS_SCM_PRETEND_VERSION variable. If I execute the jobs locally with:

$ cd arrow/python/examples/minimal_build
$ docker-compose build
$ docker-compose run --rm minimal-ubuntu-venv

they are successful.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==== 6115 passed, 1538 skipped, 7 xfailed, 2 warnings in 79.12s (0:01:19) =====

@anjakefala for the sake of investigation could you fetch the tags from the main arrow repo and push them to your fork and retry? This should fix the crossbow jobs. If that's the case we should open a follow up issue but this is slightly unrelated to your PR.

raulcd avatar May 03 '24 11:05 raulcd

When I say, I execute the jobs locally, it is with the changes on your PR :)

raulcd avatar May 03 '24 11:05 raulcd

(The force push was just to deal locally with a merge conflict. Any actual update has been committed.)

anjakefala avatar May 17 '24 20:05 anjakefala

The failures you are seeing might be caused by https://github.com/apache/arrow/pull/41455, which started to do am "out of source" build

jorisvandenbossche avatar May 18 '24 06:05 jorisvandenbossche

Specifically for building an sdist or wheel, the recommended way to do that nowadays is using the build package (python -m build --sdist for sdist, build.pypa.io)

Since this invocation requires the presence of an additional dependency (build), I've retracted my changes related to it for now. I wanted to make sure other things worked, before I added the complexity of making sure build was installed where it needed to be.

anjakefala avatar May 21 '24 21:05 anjakefala

The current failures are related to the bump-versions tests I referred on my comment

raulcd avatar May 22 '24 11:05 raulcd