ARROW-17487: [Python][Packaging][CI] Add support for Python 3.11
This PR adds jobs to build pyarrow wheels for Python 3.11.
@github-actions crossbow submit cp311
https://issues.apache.org/jira/browse/ARROW-17487
:warning: Ticket has not been started in JIRA, please click 'Start Progress'.
Revision: 8e2613ff74b28f04f6bc449e204eed451bfa61c2
Submitted crossbow builds: ursacomputing/crossbow @ actions-fd1ab80f49
@github-actions crossbow submit cp311
Unable to match any tasks for `cp311`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3321608980
@github-actions crossbow submit cp311
Revision: 5dc9f31155413426b5d719bd8b1de3a6ef983afb
Submitted crossbow builds: ursacomputing/crossbow @ actions-3375acd0f2
@github-actions crossbow submit cp311
Revision: 936164bee69dd42aee0d83d4b5e166709d821aac
Submitted crossbow builds: ursacomputing/crossbow @ actions-2255056a88
We are looking forward to this one being merged in Apache Airflow -> Pyarrow is one of the blocking factors to make Airflow work for Py3.11 and I am trying to make all the oss projects that we consided as friends :) a concerted effort to make Py3.11 support works - as Py 3.11 brings mainly huge improvements in performance that our users are eager to start using !
We track it in https://github.com/apache/airflow/pull/27264
If there is any help needed - happy to help also by talking to some dependencies of yours (which are likely also Airflow depenendencies). Good luck with it :)
@raulcd Perhaps try applying this patch?
diff --git a/python/pyproject.toml b/python/pyproject.toml
index edbc4ade6..a799dc761 100644
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@@ -18,7 +18,7 @@
[build-system]
requires = [
"cython >= 0.29.22",
- "oldest-supported-numpy>=0.14",
+ "oldest-supported-numpy>=2022.8.16",
"setuptools_scm",
"setuptools >= 40.1.0",
"wheel"
diff --git a/python/requirements-build.txt b/python/requirements-build.txt
index 46eb288c5..927c50d73 100644
--- a/python/requirements-build.txt
+++ b/python/requirements-build.txt
@@ -1,4 +1,4 @@
cython>=0.29
-oldest-supported-numpy>=0.14
+oldest-supported-numpy>=2022.8.16
setuptools_scm
setuptools>=38.6.0
diff --git a/python/requirements-wheel-build.txt b/python/requirements-wheel-build.txt
index 856164f09..a48b30d35 100644
--- a/python/requirements-wheel-build.txt
+++ b/python/requirements-wheel-build.txt
@@ -1,5 +1,5 @@
cython>=0.29.11
-oldest-supported-numpy>=0.14
+oldest-supported-numpy>=2022.8.16
setuptools_scm
setuptools>=58
wheel
diff --git a/python/requirements-wheel-test.txt b/python/requirements-wheel-test.txt
index 1644b2f8b..665b2ce77 100644
--- a/python/requirements-wheel-test.txt
+++ b/python/requirements-wheel-test.txt
@@ -2,26 +2,8 @@ cffi
cython
hypothesis
pickle5; platform_system != "Windows" and python_version < "3.8"
+oldest-supported-numpy>=2022.8.16
pytest
pytest-lazy-fixture
pytz
tzdata; sys_platform == 'win32'
-
-numpy==1.19.5; platform_system == "Linux" and platform_machine == "aarch64" and python_version < "3.7"
-numpy==1.21.3; platform_system == "Linux" and platform_machine == "aarch64" and python_version >= "3.7"
-numpy==1.19.5; platform_system == "Linux" and platform_machine != "aarch64" and python_version < "3.9"
-numpy==1.21.3; platform_system == "Linux" and platform_machine != "aarch64" and python_version >= "3.9"
-numpy==1.21.3; platform_system == "Darwin" and platform_machine == "arm64"
-numpy==1.19.5; platform_system == "Darwin" and platform_machine != "arm64" and python_version < "3.9"
-numpy==1.21.3; platform_system == "Darwin" and platform_machine != "arm64" and python_version >= "3.9"
-numpy==1.19.5; platform_system == "Windows" and python_version < "3.9"
-numpy==1.21.3; platform_system == "Windows" and python_version >= "3.9"
-
-pandas<1.1.0; platform_system == "Linux" and platform_machine != "aarch64" and python_version < "3.8"
-pandas; platform_system == "Linux" and platform_machine != "aarch64" and python_version >= "3.8"
-pandas; platform_system == "Linux" and platform_machine == "aarch64"
-pandas<1.1.0; platform_system == "Darwin" and platform_machine != "arm64" and python_version < "3.8"
-pandas; platform_system == "Darwin" and platform_machine != "arm64" and python_version >= "3.8"
-pandas; platform_system == "Darwin" and platform_machine == "arm64"
-pandas<1.1.0; platform_system == "Windows" and python_version < "3.8"
-pandas; platform_system == "Windows" and python_version >= "3.8"
@raulcd Perhaps try applying this patch?
I tested the patch locally and while the build of the images is successful I got a lot of test failures:
640 failed, 3430 passed, 348 skipped, 15 xfailed, 2 xpassed, 5 warnings, 8 errors in 103.69s (0:01:43)
This is how I reproduce locally:
# generate wheel
PYTHON=3.11 docker-compose build --no-cache --progress plain python-wheel-manylinux-2014
PYTHON=3.11 docker-compose run --rm python-wheel-manylinux-2014
# test wheel
PYTHON=3.11 docker-compose build --no-cache python-wheel-manylinux-test-unittests
PYTHON=3.11 docker-compose run --rm python-wheel-manylinux-test-unittests
Wheels are built successfully at the moment, I am going to trigger the job again to validate the MacOS ones but the jobs are failing due to 7 tests failing due to the change of behaviour of repr on the FileType enum, see: https://github.com/python/cpython/issues/94763 Thanks @jorisvandenbossche We probably can fix those on a following PR
@github-actions crossbow submit cp311
This patch should help fix the 3.11 enum issue:
diff --git a/python/pyarrow/_fs.pyx b/python/pyarrow/_fs.pyx
index e7b028a07..557c08149 100644
--- a/python/pyarrow/_fs.pyx
+++ b/python/pyarrow/_fs.pyx
@@ -78,6 +78,12 @@ cdef CFileType _unwrap_file_type(FileType ty) except *:
assert 0
+def _file_type_to_string(ty):
+ # Python 3.11 changed str(IntEnum) to return the string representation
+ # of the integer value: https://github.com/python/cpython/issues/94763
+ return f"{ty.__class__.__name__}.{ty._name_}"
+
+
cdef class FileInfo(_Weakrefable):
"""
FileSystem entry info.
@@ -185,9 +191,10 @@ cdef class FileInfo(_Weakrefable):
except ValueError:
return ''
- s = '<FileInfo for {!r}: type={}'.format(self.path, str(self.type))
+ s = (f'<FileInfo for {self.path!r}: '
+ f'type={_file_type_to_string(self.type)}')
if self.is_file:
- s += ', size={}'.format(self.size)
+ s += f', size={self.size}'
s += '>'
return s
Revision: d5adbac21b4bafda8c488b75a1fd122bdffc98e6
Submitted crossbow builds: ursacomputing/crossbow @ actions-f88a7ca39e
The tests are trying to compile grpcio, can we avoid that? https://github.com/ursacomputing/crossbow/actions/runs/3330588690/jobs/5509267855#step:11:116
Either install the GCS testbench on a different Python (with binary wheels), or don't test GCS at all on 3.11.
@github-actions crossbow submit cp311
Revision: fdac52b0490287e88978b694daf5083f3eaf40ac
Submitted crossbow builds: ursacomputing/crossbow @ actions-77a8b7ac66
The only wheel job failing is wheel-macos-big-sur-cp311-universal2. There are issues resolving the numpy version from old_supported_numpy. I am trying to understand why this issue is only happening on this specific job:
Collecting oldest-supported-numpy>=0.14
Using cached oldest_supported_numpy-2022.8.16-py3-none-any.whl (3.9 kB)
Collecting setuptools_scm
Using cached setuptools_scm-7.0.5-py3-none-any.whl (42 kB)
Collecting setuptools>=58
Using cached setuptools-65.5.0-py3-none-any.whl (1.2 MB)
Collecting wheel
Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Collecting oldest-supported-numpy>=0.14
Using cached oldest_supported_numpy-2022.5.28-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-2022.5.27-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-2022.4.18-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-2022.4.10-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-2022.4.8-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-2022.3.27-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-2022.1.30-py3-none-any.whl (3.9 kB)
Using cached oldest_supported_numpy-0.15-py3-none-any.whl (3.8 kB)
Using cached oldest_supported_numpy-0.14-py3-none-any.whl (3.8 kB)
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of cython to determine which version is compatible with other requirements. This could take a while.
Collecting cython>=0.29.11
Using cached Cython-0.29.30-py2.py3-none-any.whl (985 kB)
ERROR: Cannot install -r /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/arrow/python/requirements-wheel-build.txt (line 2) because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
The conflict is caused by:
oldest-supported-numpy 2022.8.16 depends on numpy==1.23.2; python_version == "3.11" and platform_python_implementation != "PyPy"
oldest-supported-numpy 2022.5.28 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 2022.5.27 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 2022.4.18 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 2022.4.10 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 2022.4.8 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 2022.3.27 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 2022.1.30 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 0.15 depends on numpy; python_version >= "3.11"
oldest-supported-numpy 0.14 depends on numpy; python_version >= "3.11"
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
@raulcd it seems like numpy arm64 wheels are only available for the macosx_11.0 while the failing job is trying to use the MACOSX_DEPLOYMENT_TARGET=10.14. Setting a correct target here should help
it seems like numpy arm64 wheels are only available for the macosx_11.0
That's also the case for older Python versions, so if this is the reason it would be strange we only see it for 3.11?
I don't understand the version resolution conflict from pip. It seems total nonsense:
oldest-supported-numpy 2022.8.16 depends on numpy==1.23.2; python_version == "3.11" and platform_python_implementation != "PyPy"
oldest-supported-numpy 2022.5.28 depends on numpy; python_version >= "3.11"
...
How is that a conflict?
it seems like numpy arm64 wheels are only available for the macosx_11.0
That's also the case for older Python versions, so if this is the reason it would be strange we only see it for 3.11?
Ah, but with older Python versions, oldest-supported-numpy will also pick an older numpy(eg 1.21.6 for Python 3.10: https://pypi.org/project/numpy/1.21.6/#files), and yes, older numpy versions have "universal" wheels with 10_9 deployment target
So it is here that the deployment target should be changed in case of 3.11:
https://github.com/apache/arrow/blob/c56934b57922a6cbb46eaef097a36ed8d2473467/dev/tasks/tasks.yml#L533-L539
(we would maybe also stop building universal2 wheels, since we provide also both arm64 and x86_64 wheels)
For the record, I posted https://discuss.python.org/t/dependency-resolution-conflict-on-universal2-with-pip-22-3-and-python-3-11/20419
So it is here that the deployment target should be changed in case of 3.11:
But why would that be necessary only for universal2, not arm64?
Note that Numpy doesn't even provide universal2 wheels!
(I'd also be in favor of not bothering with universal wheels, btw)
But why would that be necessary only for
universal2, notarm64? Note that Numpy doesn't even provideuniversal2wheels!
Because for arm64 we set a different deployment target in tasks.yml, for that task we already set it to 11 And numpy did provide universal wheels for older numpy versions, so that's the reason it only fails for Python 3.11
Oh, I see.
@github-actions crossbow submit -g wheel