airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Python3.12

Open potiuk opened this issue 1 year ago • 48 comments
trafficstars


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk avatar Jan 12 '24 15:01 potiuk

cc: @dirrao @Taragolis -> seems like apache-beam having numpy as dependency is the next problem to solve after pendulum is solved

 Downloading numpy-1.24.4.tar.gz (10.9 MB)
  #56 70.61            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.9/10.9 MB 36.6 MB/s eta 0:00:00
  #56 70.61         Installing build dependencies: started
  #56 70.61         Installing build dependencies: finished with status 'done'
  #56 70.61         Getting requirements to build wheel: started
  #56 70.61         Getting requirements to build wheel: finished with status 'error'
  #56 70.61         error: subprocess-exited-with-error
  #56 70.61       
  #56 70.61         × Getting requirements to build wheel did not run successfully.
  #56 70.61         │ exit code: 1
  #56 70.61         ╰─> [33 lines of output]
  #56 70.61             Traceback (most recent call last):
  #56 70.61               File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
  #56 70.61                 main()
  #56 70.61               File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
  #56 70.61                 json_out['return_val'] = hook(**hook_input['kwargs'])
  #56 70.61                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  #56 70.61               File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 112, in get_requires_for_build_wheel
  #56 70.61                 backend = _build_backend()
  #56 70.61                           ^^^^^^^^^^^^^^^^
  #56 70.61               File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
  #56 70.61                 obj = import_module(mod_path)
  #56 70.61                       ^^^^^^^^^^^^^^^^^^^^^^^

potiuk avatar Jan 12 '24 15:01 potiuk

Looks like we need NumPy 1.26+ - from that long discussion here: https://github.com/numpy/numpy/issues/23808 and Apache Beam has <1.25 even in main: https://github.com/apache/beam/blob/master/sdks/python/setup.py#L304

So likely the next best thing to do is to exclude apache-beam provider for python 3.12

This is kinda expected, Beam is always dragging us behind

potiuk avatar Jan 12 '24 15:01 potiuk

Pushed a fixup marking it for exclusion - let's see.

potiuk avatar Jan 12 '24 15:01 potiuk

Seems like 1.26.0 is the first release for numpy which officially support 3.12:

  • https://github.com/numpy/numpy/releases/tag/v1.26.0rc1

Taragolis avatar Jan 12 '24 15:01 Taragolis

Oh... I've post my comment without refresh page, and you've already found the same things

Taragolis avatar Jan 12 '24 15:01 Taragolis

Seems like Google provider also not compatible with 3.12 yet

46.9 ERROR: Could not find a version that satisfies the requirement google-ads>=22.1.0; extra == "google" (from apache-airflow[google]) (from versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.3.1, 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.4.1, 3.0.0, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 4.0.0, 4.1.0, 4.1.1, 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.1.0, 6.0.0, 7.0.0, 8.0.0, 8.1.0, 8.2.0, 9.0.0, 10.0.0, 11.0.0, 11.0.1, 11.0.2, 12.0.0, 13.0.0, 14.0.0, 14.0.1, 14.1.0, 15.0.0, 15.1.0, 15.1.1, 16.0.0, 17.0.0, 18.0.0, 18.1.0, 18.2.0, 19.0.0, 20.0.0, 21.0.0, 21.1.0, 21.2.0, 21.3.0, 22.0.0)
  246.9 ERROR: No matching distribution found for google-ads>=22.1.0; extra == "google"

Latest google-ads package pinned to >=3.7, <3.12

Issue for add support of Python 3.12 already exists https://github.com/googleads/google-ads-python/issues/813

Taragolis avatar Jan 12 '24 16:01 Taragolis

All right ... let me exclude google provider too then. At this stage I have a feeling that excluding few - even huge and important - providers and having an open -issue to bring the 3.12 support in would be a good thing.

And I know for a fact that google team wants to split the google provider and splitting of ads was the first thing to try anyway, so that might accelerate things a bit.

potiuk avatar Jan 12 '24 17:01 potiuk

Pushed.

potiuk avatar Jan 12 '24 17:01 potiuk

BTW. I really like how nicely and transpartently the new pyproject.toml exclusion works now simply pre-commit updating pyproject.toml and it's immediately visible what is excluded :D

potiuk avatar Jan 12 '24 17:01 potiuk

I think one day we need to finally resolve Consider splitting Google Provider because google provider is really huge providers (28k+ lines which tracked by our test) and contains quite a few different components:

  • Cloud (GCP)
  • Google ADS
  • Google Suite
  • LevelDB
  • Google Firebase (is it part of GCP?)

So if we found the way how it could be done it might prevent the situation that one of this component become a showstopper for others

image

Taragolis avatar Jan 12 '24 17:01 Taragolis

I think one day we need to finally resolve Consider splitting Google Provider because google provider is really huge providers (28k+ lines which tracked by our test) and contains quite a few different components:

This is precisely the plan I am discussing with Google team :)

potiuk avatar Jan 12 '24 17:01 potiuk

So we have duckdb dependency issue now, it already has python 3.12 support which added by https://github.com/duckdb/duckdb/pull/10144 (Linux and MacOS) but not released yet

Taragolis avatar Jan 12 '24 17:01 Taragolis

Seems 0.9.3.dev2258 pre-release is a first which supports 3.12

❯ pip install duckdb==0.9.3.dev2258
Collecting duckdb==0.9.3.dev2258
  Obtaining dependency information for duckdb==0.9.3.dev2258 from https://files.pythonhosted.org/packages/3c/43/094637a1939e8ba6ae53a788bd46adfd0b71fe0a6e182c8e6179b2966e09/duckdb-0.9.3.dev2258-cp312-cp312-macosx_11_0_arm64.whl.metadata
  Downloading duckdb-0.9.3.dev2258-cp312-cp312-macosx_11_0_arm64.whl.metadata (768 bytes)
Downloading duckdb-0.9.3.dev2258-cp312-cp312-macosx_11_0_arm64.whl (13.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.7/13.7 MB 3.6 MB/s eta 0:00:00
Installing collected packages: duckdb
Successfully installed duckdb-0.9.3.dev2258

Taragolis avatar Jan 12 '24 18:01 Taragolis

Pushed a change for it :)

potiuk avatar Jan 12 '24 18:01 potiuk

One more step and a new error, this time it is related to the LevelDB, which is also part of google provider, but I guess it has separate extra

I think this is the same issue: https://github.com/wbolster/plyvel/issues/158

Taragolis avatar Jan 12 '24 22:01 Taragolis

Running

potiuk avatar Jan 13 '24 03:01 potiuk

Build Prod Images still picking the google ads provider.

#56 25.60 ERROR: Ignored the following versions that require a different python version: 2.7.3 Requires-Python <3.12,~=3.8; 2.7.3rc1 Requires-Python <3.12,~=3.8; 2.8.0 Requires-Python <3.12,~=3.8; 2.8.0b1 Requires-Python <3.12,~=3.8; 2.8.0rc1 Requires-Python <3.12,~=3.8; 2.8.0rc2 Requires-Python <3.12,~=3.8; 2.8.0rc3 Requires-Python <3.12,~=3.8; 2.8.0rc4 Requires-Python <3.12,~=3.8; 22.1.0 Requires-Python >=3.7, <3.12
[17469](https://github.com/apache/airflow/actions/runs/7509965786/job/20447997500?pr=36755#step:5:17544)
  #56 25.60 ERROR: Could not find a version that satisfies the requirement google-ads>=22.1.0 (from apache-airflow-providers-google) (from versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.3.1, 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.4.1, 3.0.0, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 4.0.0, 4.1.0, 4.1.1, 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4, 5.1.0, 6.0.0, 7.0.0, 8.0.0, 8.1.0, 8.2.0, 9.0.0, 10.0.0, 11.0.0, 11.0.1, 11.0.2, 12.0.0, 13.0.0, 14.0.0, 14.0.1, 14.1.0, 15.0.0, 15.1.0, 15.1.1, 16.0.0, 17.0.0, 18.0.0, 18.1.0, 18.2.0, 19.0.0, 20.0.0, 21.0.0, 21.1.0, 21.2.0, 21.3.0, 22.0.0)
[17470](https://github.com/apache/airflow/actions/runs/7509965786/job/20447997500?pr=36755#step:5:17545)
  #56 25.60 ERROR: No matching distribution found for google-ads>=22.1.0
[17471](https://github.com/apache/airflow/actions/runs/7509965786/job/20447997500?pr=36755#step:5:17546)

dirrao avatar Jan 14 '24 09:01 dirrao

Yes. Because google provider is installed and epxected to be installed when PROD image is built. So if we do not build it locally for Pythin 3.12, it will install the one from PyPI. Luckily it looks like google ads maintainers are going to release 3.12-compatible version this week https://github.com/googleads/google-ads-python/issues/813#issuecomment-1889826645 so we should - I think - wait for it. Releasing 3.12 version without google provider, when we know we will likely be able to install it in two days, is likely just not worth the effort (we would have to add a code to exclude certain providers from PROD image installation.

In the meatime - we could take a close look at the failing tests for Python 3.13 https://github.com/apache/airflow/actions/runs/7509965786/job/20448000561?pr=36755)

I think they mostly fail because google and beam providers are missing - but if there are any other tests we should look at them. I have not looked in detail yet but there are at least a few with "real" 3.12 incompatiblities (in test code at least) that could be fixed in the meantime:

           AttributeError: 'called_once' is not a valid assertion. Use a spec for the mock if 'called_once' is meant to be an attribute.

potiuk avatar Jan 14 '24 10:01 potiuk

cc: @dirrao @Taragolis -> Ads released with 3.12 support https://pypi.org/project/google-ads/ - removed the limit from Google provider, let's see.

potiuk avatar Jan 23 '24 20:01 potiuk

Duckdb removed devel version we had pinned for 3.12 -> replaced it with >= for the new devel

potiuk avatar Jan 23 '24 20:01 potiuk

Strangely enough latest dev of duckdb does not suport 3.12 - but the previous one does

potiuk avatar Jan 23 '24 21:01 potiuk

BTW. When you try to compile sdist from duckdb - what happens with your CPU is ... Interesting

I thought my desktop Linux turns into an :airplane:

potiuk avatar Jan 23 '24 21:01 potiuk

All right - it builds nicely on my machine - now we will just have to fix all the failing tests . Let's see how many of those will be.

potiuk avatar Jan 23 '24 22:01 potiuk

Fixed I think errors with caled_once in a few tests. We also have to exclude cassandra for now:

# Cassandra provider is not yet compatible with Python 3.12
# The main issue is that python cassandra driver by default uses asyncore which has been deprecated since
# Python 3.6 and removed in Python 3.12 (https://docs.python.org/3.11/library/asyncore.html)
# The issue is tracked here: https://datastax-oss.atlassian.net/browse/PYTHON-1375 and is scheduled
# to be fixed in cassandra-driver 3.30.0.

potiuk avatar Jan 23 '24 23:01 potiuk

There are few more interesting failures:

  • we have a few tests that were not testing the right things. They were calling assert mock.called_once() or assert mock.called_with() which was wrong - because those are attributes not methods -> should be mock.assert_called_once_with() - we need to fix those (and at least one of the tests for celery_executor is not called, so we need to check more closely - maybe those will uncover real bugs.

  • apparently some mysql tests fail claiming that pickled data is too big to store in the DB - which I guess might be attributed to much more complex bytecode (with specialized interpreters and other speed improvements). But we need to look in more detail

  • Unfortunately looks that we are held by universal_pathlib and object storage / io.

We have

____________________ ERROR collecting tests/io/test_path.py ____________________
ImportError while importing test module '/opt/airflow/tests/io/test_path.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/io/test_path.py:31: in <module>
    from airflow.io.path import ObjectStoragePath
airflow/io/path.py:29: in <module>
    from upath.implementations.cloud import CloudPath, _CloudAccessor
/usr/local/lib/python3.12/site-packages/upath/__init__.py:2: in <module>
    from upath.core import UPath
/usr/local/lib/python3.12/site-packages/upath/core.py:8: in <module>
    from pathlib import _PosixFlavour  # type: ignore
E   ImportError: cannot import name '_PosixFlavour' from 'pathlib' (/usr/local/lib/python3.12/pathlib.py)
----------- generated xml file: /files/test_result-other-sqlite.xml ------------

And the reason is that universal_pathlib does not support Python 3.12 yet.

The issue is here https://github.com/fsspec/universal_pathlib/issues/137 (and fix seems to be coming as flavours are retrieved differently now in https://github.com/fsspec/universal_pathlib/pull/152 )

potiuk avatar Jan 24 '24 00:01 potiuk

cc: @bolkedebruin @uranusjr in case there might be some alternatives (but I do not expect) for universal_pathlib ^^. The problem with this one is that it's not a provider, and we cannot exclude it. It will hold us back from supporting Python 3.12 as a "hard" stop.

potiuk avatar Jan 24 '24 00:01 potiuk

Fixed I think errors with caled_once in a few tests. We also have to exclude cassandra for now:

Hmm.. I looks like 3.29 version of cassandra-driver should support Python 3.12

Taragolis avatar Jan 24 '24 08:01 Taragolis

Hmm.. I looks like 3.29 version of cassandra-driver should support Python 3.12

In a way. Looking to the issues and discussions there:

  • They are technically Python 3.12 compatible
  • However, the python client uses by default asyncore reactor - and asyncore has been removed from Python 3.12 - see for example here https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040098
  • Apparently the natural replacement for asyncore reactor is asyncio reactor
  • The asyncio reactor is technically experimental https://docs.datastax.com/en/developer/python-driver/3.29/api/cassandra/io/asyncioreactor/
  • There is - not merged yet - PR that implements various fixes and stabilization changes for various Python versions in asyncio https://github.com/datastax/python-driver/pull/1189 - as you know asyncio was evolving even in 3.8, 3.9, 3.10, 3.11. 3.12 and various versions had various behaviours
  • the https://datastax-oss.atlassian.net/browse/PYTHON-1375 has been created with the goal of merging the stabilization changes and adding all necessary tests to make asyncio default reactor. Once done this should be a "drop-in" replacement and it should just work without us doing anything. It is scheduled for 3.30 of python client.
  • Technically speaking (though I have no idea how and what are consequences and potential issues) we could switch to asyncio reactor only for 3.12, however, I think we need someone who has and uses cassandra and could test it

Looking at all that - I think the best course of action is to wait until they implement all the stabilization and tests. This will - inevitably - happen, requires the least effort from our side and we avoid the case that we will have to solve some potential issues that will be uncovered and fixed during the stabilization of asyncio reactor by Cassandra maintainers.

potiuk avatar Jan 24 '24 08:01 potiuk

Oh... I've miss that

Apparently the natural replacement for asyncore reactor is asyncio reactor

If were cassandra-driver maintainer I would rather look at anyio rather then native asyncio

In general speaking we also might have a look at anyo for the triggerer but it is another story and another challenge 😄

Taragolis avatar Jan 25 '24 01:01 Taragolis

In general speaking we also might have a look at anyo for the triggerer but it is another story and another challenge

Yeah. I think we should not even attempt to do something different than "official" approach of casssandra. It's easy for us to simply remove Python 3.12 from supported versions in Cassandra until they fix it (they will, eventually) - it does not block anyone (if they want cassandra, they might use 3.11). If anything, doing that will simply pressurise cassandra guys (we will link the issues and will tell them we are disabling cassandra for Python 3.12) and incentivise them to fix it.

potiuk avatar Jan 25 '24 10:01 potiuk