datahub
datahub copied to clipboard
ci(ingest): test with python 3.11
Currently blocked on https://github.com/googleapis/python-bigquery-sqlalchemy/issues/500.
Checklist
- [ ] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
- [ ] Links to related issues (if applicable)
- [ ] Tests for the changes have been added/updated (if applicable)
- [ ] Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
- [ ] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub
Unit Test Results (build & test)
621 tests ±0 617 :heavy_check_mark: ±0 15m 53s :stopwatch: +10s 157 suites ±0 4 :zzz: ±0 157 files ±0 0 :x: ±0
Results for commit 8cd620c1. ± Comparison against base commit b7c03731.
Unit Test Results (metadata ingestion)
4 files - 4 4 suites - 4 28m 6s :stopwatch: - 29m 36s 759 tests ± 0 755 :heavy_check_mark: - 1 4 :zzz: +1 0 :x: ±0 760 runs - 760 756 :heavy_check_mark: - 757 4 :zzz: - 3 0 :x: ±0
Results for commit 8cd620c1. ± Comparison against base commit b7c03731.
This pull request skips 1 test.
tests.integration.feast.test_feast_repository ‑ test_feast_repository_ingest
~~Currently blocked on https://github.com/googleapis/python-bigquery-sqlalchemy/issues/500.~~
Now blocked on https://github.com/googleapis/python-bigquery-sqlalchemy/pull/543
Looks all tests are failing now.
Looks like we're somehow pinning pyarrow to an old version, which doesn't have a pre-built binary for python 3.11. Ideally we should loosen our deps, but we can also add the necessary requirements so that it can build from source for 3.11 instead.
Blocked because of https://github.com/feast-dev/feast/issues/3510
No longer blocked on feast.
Now we're blocked on https://github.com/cloudera/python-sasl/issues/30 (we actually use sasl3 and not sasl but they face the same issue). We only depend on sasl3 from acryl-pyhive, so we probably could upgrade it to use pure-sasl instead. There seems to be an effort around that already https://github.com/dropbox/PyHive/pull/454.
Just need to merge https://github.com/acryldata/PyHive/pull/7 and update here, and then we should be good to go.
@hsheth2
I made a couple of contributions to PyHive which were accepted and released in 0.7.1.dev0. You are requested to test with the dev version and report any bugs in the PyHive GitHub repository before 0.7.1 is released in a month or so.
- PyHive also supports pure-sasl via additional extras 'pyhive[hive_pure_sasl]' which supports Python 3.11 in addition to previous Python versions. See https://github.com/dropbox/PyHive/pull/454
- PyHive is now compatible with SQLAlchemy 2.0. See https://github.com/dropbox/PyHive/pull/457
Looks like there's two errors now:
- PowerBI using a mutable type for a dataclass default
- Something related to typing (with snowflake?), similar to https://github.com/python/cpython/issues/100316
- The powerbi issue will be fixed by https://github.com/datahub-project/datahub/pull/8609
- Looks like the CodeType error is from pyspark and fixed in recent versions, so we just need to upgrade: blocked on https://github.com/datahub-project/datahub/pull/8638
- There's also an issue with the ratelimiter package - we'll need to find an alternative because the project looks dead: https://github.com/RazerM/ratelimiter/issues/18
Unblocked now that https://github.com/datahub-project/datahub/pull/9008 was merged
One more thing here: To be compatible with Python 3.11, we need to be on pyspark 3.4+ (see https://github.com/apache/spark/pull/38987)
However, pydeequ still depends on pyspark 3.3 right now (https://github.com/awslabs/python-deequ/pull/168). The underlying deequ issue was recently fixed (https://github.com/awslabs/deequ/pull/505), so pydeequ will hopefully get updated quickly.