datahub
datahub copied to clipboard
metadata-ingestion: Update great-expectations dependency from 0.15 to 0.16
Currently, DataHub depends on great-expectations <= 0.15.50, which is no longer actively maintained. The latest version is 0.16.13, which adds Fluent Datasources that make GX much more user friendly.
However, the new releases remove deprecated code that is used by DataHub, e.g., SQLAlchemyDataset/Datasource in the data profiler and probably some data-asset related stuff in the GX action.
Please update the dependency to 0.16 so that our users can use the new GX version with the datahub action.
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
This issue was closed because it has been inactive for 30 days since being marked as stale.
Any update on this?
This issue is on our radar, but unfortunately isn't a simple fix because of the level of customization and patching we've done in our existing GX-based data profilers. We've had some conversations with the GX team around what it would take to get this done, and are working to scope it accordingly.
any updates on this issue? I mean GX is at 0.18 in the meantime :)
Any updates on this? they are about to move to 1.x.x :)
to make datahub work with recent airflow
need to bump GE at least to 0.16.8
currently it clinches with urllib3
version pinned in older GE versions to 1.26, while airflow pinned to 2.x
and botocore
for python 3.10+ too
Thus, great-expectations (>=0.15.12,<0.15.50) requires urllib3 (>=1.25.4,<1.27)
Are there any loose timelines around when this can be resolved?
I'm sorry, I know these sorts of "me too" comments are rarely of much help. I wanted to highlight that great-expectations at the pinned version has a variety of upper bounds constraints: https://raw.githubusercontent.com/great-expectations/great_expectations/0.15.50/requirements.txt
altair>=4.0.0,<4.2.1
pydantic>=1.10.4,<2.0
urllib3>=1.25.4,<1.27
And at least for us the problem isn't so much that "great expectations is old" but that being on the lower side of these transitive dependencies -- like the pydantic v1-v2 transitions -- has ever increasing opportunity costs. (In our particular transitive set pydantic <2 is also keeping us on pandas<2, which adds further to the expense.)
I know this doesn't change anything about the difficulty of migration, but I hope it clarifies the "cost" somewhat when this issue is next triaged.
Any updates on this? The latest version of datahub_action for GX also needs to get updated to reflect the latest changes. It is a one line change tho.
Just want to clarify which of these issues people are trying to solve:
- Use
datahub_action
with latest GX - Install datahub ingestion sources inside one big venv (e.g. airflow)
@shirshanka for the datahub action to work with the latest version of GX I managed to just modify a couple of lines of code to fix the class constructor function. But the bigger issue is that if we have airflow installed with the datahub plugin we cannot use the latest version of GX in our dags due to version conflict.
Install datahub ingestion sources inside one big venv (e.g. airflow)
This one. We use a monorepo and minimizing the number of transitive dependency sets we are juggling maximizes the usefulness of said monorepo.
@shirshanka It looks like the changes that introduced pydantic v2 support in great-expectations will be easy to backport to 0.15.50
. If I do that, would datahub consider using them as a springboard to support pydantic v2 for plugins?
If anyone wants it, I pushed it up to my fork, and here's the diff from 0.15.50
. I am going to try patching datahub on a fork to consume this version of great expectations, and see if that works for us.