airflow
airflow copied to clipboard
Python3.12
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.
cc: @dirrao @Taragolis -> seems like apache-beam having numpy as dependency is the next problem to solve after pendulum is solved
Downloading numpy-1.24.4.tar.gz (10.9 MB)
#56 70.61 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.9/10.9 MB 36.6 MB/s eta 0:00:00
#56 70.61 Installing build dependencies: started
#56 70.61 Installing build dependencies: finished with status 'done'
#56 70.61 Getting requirements to build wheel: started
#56 70.61 Getting requirements to build wheel: finished with status 'error'
#56 70.61 error: subprocess-exited-with-error
#56 70.61
#56 70.61 × Getting requirements to build wheel did not run successfully.
#56 70.61 │ exit code: 1
#56 70.61 ╰─> [33 lines of output]
#56 70.61 Traceback (most recent call last):
#56 70.61 File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#56 70.61 main()
#56 70.61 File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#56 70.61 json_out['return_val'] = hook(**hook_input['kwargs'])
#56 70.61 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#56 70.61 File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 112, in get_requires_for_build_wheel
#56 70.61 backend = _build_backend()
#56 70.61 ^^^^^^^^^^^^^^^^
#56 70.61 File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
#56 70.61 obj = import_module(mod_path)
#56 70.61 ^^^^^^^^^^^^^^^^^^^^^^^
Looks like we need NumPy 1.26+ - from that long discussion here: https://github.com/numpy/numpy/issues/23808 and Apache Beam has <1.25 even in main: https://github.com/apache/beam/blob/master/sdks/python/setup.py#L304
So likely the next best thing to do is to exclude apache-beam provider for python 3.12
This is kinda expected, Beam is always dragging us behind
Pushed a fixup marking it for exclusion - let's see.
Seems like 1.26.0 is the first release for numpy which officially support 3.12:
- https://github.com/numpy/numpy/releases/tag/v1.26.0rc1
Oh... I've post my comment without refresh page, and you've already found the same things
Seems like Google provider also not compatible with 3.12 yet
46.9 ERROR: Could not find a version that satisfies the requirement google-ads>=22.1.0; extra == "google" (from apache-airflow[google]) (from versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.3.1, 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.4.1, 3.0.0, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 4.0.0, 4.1.0, 4.1.1, 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.1.0, 6.0.0, 7.0.0, 8.0.0, 8.1.0, 8.2.0, 9.0.0, 10.0.0, 11.0.0, 11.0.1, 11.0.2, 12.0.0, 13.0.0, 14.0.0, 14.0.1, 14.1.0, 15.0.0, 15.1.0, 15.1.1, 16.0.0, 17.0.0, 18.0.0, 18.1.0, 18.2.0, 19.0.0, 20.0.0, 21.0.0, 21.1.0, 21.2.0, 21.3.0, 22.0.0)
246.9 ERROR: No matching distribution found for google-ads>=22.1.0; extra == "google"
Latest google-ads package pinned to >=3.7, <3.12
Issue for add support of Python 3.12 already exists https://github.com/googleads/google-ads-python/issues/813
All right ... let me exclude google provider too then. At this stage I have a feeling that excluding few - even huge and important - providers and having an open -issue to bring the 3.12 support in would be a good thing.
And I know for a fact that google team wants to split the google provider and splitting of ads was the first thing to try anyway, so that might accelerate things a bit.
Pushed.
BTW. I really like how nicely and transpartently the new pyproject.toml exclusion works now simply pre-commit updating pyproject.toml and it's immediately visible what is excluded :D
I think one day we need to finally resolve Consider splitting Google Provider because google provider is really huge providers (28k+ lines which tracked by our test) and contains quite a few different components:
- Cloud (GCP)
- Google ADS
- Google Suite
- LevelDB
- Google Firebase (is it part of GCP?)
So if we found the way how it could be done it might prevent the situation that one of this component become a showstopper for others
I think one day we need to finally resolve Consider splitting Google Provider because google provider is really huge providers (28k+ lines which tracked by our test) and contains quite a few different components:
This is precisely the plan I am discussing with Google team :)
So we have duckdb dependency issue now, it already has python 3.12 support which added by https://github.com/duckdb/duckdb/pull/10144 (Linux and MacOS) but not released yet
Seems 0.9.3.dev2258 pre-release is a first which supports 3.12
❯ pip install duckdb==0.9.3.dev2258
Collecting duckdb==0.9.3.dev2258
Obtaining dependency information for duckdb==0.9.3.dev2258 from https://files.pythonhosted.org/packages/3c/43/094637a1939e8ba6ae53a788bd46adfd0b71fe0a6e182c8e6179b2966e09/duckdb-0.9.3.dev2258-cp312-cp312-macosx_11_0_arm64.whl.metadata
Downloading duckdb-0.9.3.dev2258-cp312-cp312-macosx_11_0_arm64.whl.metadata (768 bytes)
Downloading duckdb-0.9.3.dev2258-cp312-cp312-macosx_11_0_arm64.whl (13.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.7/13.7 MB 3.6 MB/s eta 0:00:00
Installing collected packages: duckdb
Successfully installed duckdb-0.9.3.dev2258
Pushed a change for it :)
One more step and a new error, this time it is related to the LevelDB, which is also part of google provider, but I guess it has separate extra
I think this is the same issue: https://github.com/wbolster/plyvel/issues/158
Running
Build Prod Images still picking the google ads provider.
#56 25.60 ERROR: Ignored the following versions that require a different python version: 2.7.3 Requires-Python <3.12,~=3.8; 2.7.3rc1 Requires-Python <3.12,~=3.8; 2.8.0 Requires-Python <3.12,~=3.8; 2.8.0b1 Requires-Python <3.12,~=3.8; 2.8.0rc1 Requires-Python <3.12,~=3.8; 2.8.0rc2 Requires-Python <3.12,~=3.8; 2.8.0rc3 Requires-Python <3.12,~=3.8; 2.8.0rc4 Requires-Python <3.12,~=3.8; 22.1.0 Requires-Python >=3.7, <3.12
[17469](https://github.com/apache/airflow/actions/runs/7509965786/job/20447997500?pr=36755#step:5:17544)
#56 25.60 ERROR: Could not find a version that satisfies the requirement google-ads>=22.1.0 (from apache-airflow-providers-google) (from versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.3.1, 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.4.1, 3.0.0, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 4.0.0, 4.1.0, 4.1.1, 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.1.0, 6.0.0, 7.0.0, 8.0.0, 8.1.0, 8.2.0, 9.0.0, 10.0.0, 11.0.0, 11.0.1, 11.0.2, 12.0.0, 13.0.0, 14.0.0, 14.0.1, 14.1.0, 15.0.0, 15.1.0, 15.1.1, 16.0.0, 17.0.0, 18.0.0, 18.1.0, 18.2.0, 19.0.0, 20.0.0, 21.0.0, 21.1.0, 21.2.0, 21.3.0, 22.0.0)
[17470](https://github.com/apache/airflow/actions/runs/7509965786/job/20447997500?pr=36755#step:5:17545)
#56 25.60 ERROR: No matching distribution found for google-ads>=22.1.0
[17471](https://github.com/apache/airflow/actions/runs/7509965786/job/20447997500?pr=36755#step:5:17546)
Yes. Because google provider is installed and epxected to be installed when PROD image is built. So if we do not build it locally for Pythin 3.12, it will install the one from PyPI. Luckily it looks like google ads maintainers are going to release 3.12-compatible version this week https://github.com/googleads/google-ads-python/issues/813#issuecomment-1889826645 so we should - I think - wait for it. Releasing 3.12 version without google provider, when we know we will likely be able to install it in two days, is likely just not worth the effort (we would have to add a code to exclude certain providers from PROD image installation.
In the meatime - we could take a close look at the failing tests for Python 3.13 https://github.com/apache/airflow/actions/runs/7509965786/job/20448000561?pr=36755)
I think they mostly fail because google and beam providers are missing - but if there are any other tests we should look at them. I have not looked in detail yet but there are at least a few with "real" 3.12 incompatiblities (in test code at least) that could be fixed in the meantime:
AttributeError: 'called_once' is not a valid assertion. Use a spec for the mock if 'called_once' is meant to be an attribute.
cc: @dirrao @Taragolis -> Ads released with 3.12 support https://pypi.org/project/google-ads/ - removed the limit from Google provider, let's see.
Duckdb removed devel version we had pinned for 3.12 -> replaced it with >= for the new devel
Strangely enough latest dev of duckdb does not suport 3.12 - but the previous one does
BTW. When you try to compile sdist from duckdb - what happens with your CPU is ... Interesting
I thought my desktop Linux turns into an :airplane:
All right - it builds nicely on my machine - now we will just have to fix all the failing tests . Let's see how many of those will be.
Fixed I think errors with caled_once in a few tests. We also have to exclude cassandra for now:
# Cassandra provider is not yet compatible with Python 3.12
# The main issue is that python cassandra driver by default uses asyncore which has been deprecated since
# Python 3.6 and removed in Python 3.12 (https://docs.python.org/3.11/library/asyncore.html)
# The issue is tracked here: https://datastax-oss.atlassian.net/browse/PYTHON-1375 and is scheduled
# to be fixed in cassandra-driver 3.30.0.
There are few more interesting failures:
-
we have a few tests that were not testing the right things. They were calling
assert mock.called_once()orassert mock.called_with()which was wrong - because those are attributes not methods -> should bemock.assert_called_once_with()- we need to fix those (and at least one of the tests for celery_executor is not called, so we need to check more closely - maybe those will uncover real bugs. -
apparently some mysql tests fail claiming that pickled data is too big to store in the DB - which I guess might be attributed to much more complex bytecode (with specialized interpreters and other speed improvements). But we need to look in more detail
-
Unfortunately looks that we are held by
universal_pathliband object storage / io.
We have
____________________ ERROR collecting tests/io/test_path.py ____________________
ImportError while importing test module '/opt/airflow/tests/io/test_path.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/io/test_path.py:31: in <module>
from airflow.io.path import ObjectStoragePath
airflow/io/path.py:29: in <module>
from upath.implementations.cloud import CloudPath, _CloudAccessor
/usr/local/lib/python3.12/site-packages/upath/__init__.py:2: in <module>
from upath.core import UPath
/usr/local/lib/python3.12/site-packages/upath/core.py:8: in <module>
from pathlib import _PosixFlavour # type: ignore
E ImportError: cannot import name '_PosixFlavour' from 'pathlib' (/usr/local/lib/python3.12/pathlib.py)
----------- generated xml file: /files/test_result-other-sqlite.xml ------------
And the reason is that universal_pathlib does not support Python 3.12 yet.
The issue is here https://github.com/fsspec/universal_pathlib/issues/137 (and fix seems to be coming as flavours are retrieved differently now in https://github.com/fsspec/universal_pathlib/pull/152 )
cc: @bolkedebruin @uranusjr in case there might be some alternatives (but I do not expect) for universal_pathlib ^^. The problem with this one is that it's not a provider, and we cannot exclude it. It will hold us back from supporting Python 3.12 as a "hard" stop.
Fixed I think errors with caled_once in a few tests. We also have to exclude cassandra for now:
Hmm.. I looks like 3.29 version of cassandra-driver should support Python 3.12
Hmm.. I looks like 3.29 version of cassandra-driver should support Python 3.12
In a way. Looking to the issues and discussions there:
- They are technically Python 3.12 compatible
- However, the python client uses by default asyncore reactor - and asyncore has been removed from Python 3.12 - see for example here https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040098
- Apparently the natural replacement for asyncore reactor is asyncio reactor
- The asyncio reactor is technically experimental https://docs.datastax.com/en/developer/python-driver/3.29/api/cassandra/io/asyncioreactor/
- There is - not merged yet - PR that implements various fixes and stabilization changes for various Python versions in asyncio https://github.com/datastax/python-driver/pull/1189 - as you know asyncio was evolving even in 3.8, 3.9, 3.10, 3.11. 3.12 and various versions had various behaviours
- the https://datastax-oss.atlassian.net/browse/PYTHON-1375 has been created with the goal of merging the stabilization changes and adding all necessary tests to make asyncio default reactor. Once done this should be a "drop-in" replacement and it should just work without us doing anything. It is scheduled for 3.30 of python client.
- Technically speaking (though I have no idea how and what are consequences and potential issues) we could switch to asyncio reactor only for 3.12, however, I think we need someone who has and uses cassandra and could test it
Looking at all that - I think the best course of action is to wait until they implement all the stabilization and tests. This will - inevitably - happen, requires the least effort from our side and we avoid the case that we will have to solve some potential issues that will be uncovered and fixed during the stabilization of asyncio reactor by Cassandra maintainers.
Oh... I've miss that
Apparently the natural replacement for asyncore reactor is asyncio reactor
If were cassandra-driver maintainer I would rather look at anyio rather then native asyncio
In general speaking we also might have a look at anyo for the triggerer but it is another story and another challenge 😄
In general speaking we also might have a look at anyo for the triggerer but it is another story and another challenge
Yeah. I think we should not even attempt to do something different than "official" approach of casssandra. It's easy for us to simply remove Python 3.12 from supported versions in Cassandra until they fix it (they will, eventually) - it does not block anyone (if they want cassandra, they might use 3.11). If anything, doing that will simply pressurise cassandra guys (we will link the issues and will tell them we are disabling cassandra for Python 3.12) and incentivise them to fix it.