datahub icon indicating copy to clipboard operation
datahub copied to clipboard

ci(ingest): test with python 3.11

Open hsheth2 opened this issue 3 years ago • 2 comments

Currently blocked on https://github.com/googleapis/python-bigquery-sqlalchemy/issues/500.

Checklist

  • [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [ ] Links to related issues (if applicable)
  • [ ] Tests for the changes have been added/updated (if applicable)
  • [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

hsheth2 avatar Nov 18 '22 00:11 hsheth2

Unit Test Results (build & test)

621 tests  ±0   617 :heavy_check_mark: ±0   15m 53s :stopwatch: +10s 157 suites ±0       4 :zzz: ±0  157 files   ±0       0 :x: ±0 

Results for commit 8cd620c1. ± Comparison against base commit b7c03731.

github-actions[bot] avatar Nov 18 '22 02:11 github-actions[bot]

Unit Test Results (metadata ingestion)

    4 files   -     4      4 suites   - 4   28m 6s :stopwatch: - 29m 36s 759 tests ±    0  755 :heavy_check_mark:  -     1  4 :zzz: +1  0 :x: ±0  760 runs   - 760  756 :heavy_check_mark:  - 757  4 :zzz:  - 3  0 :x: ±0 

Results for commit 8cd620c1. ± Comparison against base commit b7c03731.

This pull request skips 1 test.
tests.integration.feast.test_feast_repository ‑ test_feast_repository_ingest

github-actions[bot] avatar Nov 18 '22 02:11 github-actions[bot]

~~Currently blocked on https://github.com/googleapis/python-bigquery-sqlalchemy/issues/500.~~

Now blocked on https://github.com/googleapis/python-bigquery-sqlalchemy/pull/543

hsheth2 avatar Jan 11 '23 03:01 hsheth2

Looks all tests are failing now.

anshbansal avatar Feb 03 '23 10:02 anshbansal

Looks like we're somehow pinning pyarrow to an old version, which doesn't have a pre-built binary for python 3.11. Ideally we should loosen our deps, but we can also add the necessary requirements so that it can build from source for 3.11 instead.

hsheth2 avatar Feb 07 '23 14:02 hsheth2

Blocked because of https://github.com/feast-dev/feast/issues/3510

hsheth2 avatar Feb 27 '23 20:02 hsheth2

No longer blocked on feast.

Now we're blocked on https://github.com/cloudera/python-sasl/issues/30 (we actually use sasl3 and not sasl but they face the same issue). We only depend on sasl3 from acryl-pyhive, so we probably could upgrade it to use pure-sasl instead. There seems to be an effort around that already https://github.com/dropbox/PyHive/pull/454.

hsheth2 avatar Jun 07 '23 18:06 hsheth2

Just need to merge https://github.com/acryldata/PyHive/pull/7 and update here, and then we should be good to go.

hsheth2 avatar Jun 29 '23 22:06 hsheth2

@hsheth2

I made a couple of contributions to PyHive which were accepted and released in 0.7.1.dev0. You are requested to test with the dev version and report any bugs in the PyHive GitHub repository before 0.7.1 is released in a month or so.

  1. PyHive also supports pure-sasl via additional extras 'pyhive[hive_pure_sasl]' which supports Python 3.11 in addition to previous Python versions. See https://github.com/dropbox/PyHive/pull/454
  2. PyHive is now compatible with SQLAlchemy 2.0. See https://github.com/dropbox/PyHive/pull/457

mdeshmu avatar Jul 14 '23 14:07 mdeshmu

Looks like there's two errors now:

  1. PowerBI using a mutable type for a dataclass default
  2. Something related to typing (with snowflake?), similar to https://github.com/python/cpython/issues/100316

hsheth2 avatar Aug 09 '23 21:08 hsheth2

  • The powerbi issue will be fixed by https://github.com/datahub-project/datahub/pull/8609
  • Looks like the CodeType error is from pyspark and fixed in recent versions, so we just need to upgrade: blocked on https://github.com/datahub-project/datahub/pull/8638
  • There's also an issue with the ratelimiter package - we'll need to find an alternative because the project looks dead: https://github.com/RazerM/ratelimiter/issues/18

hsheth2 avatar Aug 10 '23 19:08 hsheth2

Unblocked now that https://github.com/datahub-project/datahub/pull/9008 was merged

hsheth2 avatar Oct 23 '23 22:10 hsheth2

One more thing here: To be compatible with Python 3.11, we need to be on pyspark 3.4+ (see https://github.com/apache/spark/pull/38987)

However, pydeequ still depends on pyspark 3.3 right now (https://github.com/awslabs/python-deequ/pull/168). The underlying deequ issue was recently fixed (https://github.com/awslabs/deequ/pull/505), so pydeequ will hopefully get updated quickly.

hsheth2 avatar Oct 28 '23 00:10 hsheth2