datahub icon indicating copy to clipboard operation
datahub copied to clipboard

metadata-ingestion: Update great-expectations dependency from 0.15 to 0.16

Open vrld opened this issue 1 year ago • 17 comments

Currently, DataHub depends on great-expectations <= 0.15.50, which is no longer actively maintained. The latest version is 0.16.13, which adds Fluent Datasources that make GX much more user friendly.

However, the new releases remove deprecated code that is used by DataHub, e.g., SQLAlchemyDataset/Datasource in the data profiler and probably some data-asset related stuff in the GX action.

Please update the dependency to 0.16 so that our users can use the new GX version with the datahub action.

vrld avatar May 24 '23 07:05 vrld

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] avatar Jun 25 '23 02:06 github-actions[bot]

This issue was closed because it has been inactive for 30 days since being marked as stale.

github-actions[bot] avatar Jul 26 '23 01:07 github-actions[bot]

Any update on this?

jelledv avatar Oct 12 '23 13:10 jelledv

This issue is on our radar, but unfortunately isn't a simple fix because of the level of customization and patching we've done in our existing GX-based data profilers. We've had some conversations with the GX team around what it would take to get this done, and are working to scope it accordingly.

hsheth2 avatar Nov 20 '23 22:11 hsheth2

any updates on this issue? I mean GX is at 0.18 in the meantime :)

DSchmidtDev avatar Feb 12 '24 15:02 DSchmidtDev

Any updates on this? they are about to move to 1.x.x :)

mateocolina avatar Apr 03 '24 09:04 mateocolina

to make datahub work with recent airflow need to bump GE at least to 0.16.8

currently it clinches with urllib3 version pinned in older GE versions to 1.26, while airflow pinned to 2.x

and botocore for python 3.10+ too

Thus, great-expectations (>=0.15.12,<0.15.50) requires urllib3 (>=1.25.4,<1.27)

KulykDmytro avatar Apr 19 '24 07:04 KulykDmytro

Are there any loose timelines around when this can be resolved?

VladShuvalov avatar Apr 23 '24 18:04 VladShuvalov

I'm sorry, I know these sorts of "me too" comments are rarely of much help. I wanted to highlight that great-expectations at the pinned version has a variety of upper bounds constraints: https://raw.githubusercontent.com/great-expectations/great_expectations/0.15.50/requirements.txt

altair>=4.0.0,<4.2.1
pydantic>=1.10.4,<2.0
urllib3>=1.25.4,<1.27

And at least for us the problem isn't so much that "great expectations is old" but that being on the lower side of these transitive dependencies -- like the pydantic v1-v2 transitions -- has ever increasing opportunity costs. (In our particular transitive set pydantic <2 is also keeping us on pandas<2, which adds further to the expense.)

I know this doesn't change anything about the difficulty of migration, but I hope it clarifies the "cost" somewhat when this issue is next triaged.

cburroughs avatar May 06 '24 15:05 cburroughs

Any updates on this? The latest version of datahub_action for GX also needs to get updated to reflect the latest changes. It is a one line change tho.

am2222 avatar Jun 04 '24 20:06 am2222

Just want to clarify which of these issues people are trying to solve:

  1. Use datahub_action with latest GX
  2. Install datahub ingestion sources inside one big venv (e.g. airflow)

shirshanka avatar Jun 28 '24 21:06 shirshanka

@shirshanka for the datahub action to work with the latest version of GX I managed to just modify a couple of lines of code to fix the class constructor function. But the bigger issue is that if we have airflow installed with the datahub plugin we cannot use the latest version of GX in our dags due to version conflict.

am2222 avatar Jun 29 '24 12:06 am2222

Install datahub ingestion sources inside one big venv (e.g. airflow)

This one. We use a monorepo and minimizing the number of transitive dependency sets we are juggling maximizes the usefulness of said monorepo.

cburroughs avatar Jul 02 '24 14:07 cburroughs

@shirshanka It looks like the changes that introduced pydantic v2 support in great-expectations will be easy to backport to 0.15.50. If I do that, would datahub consider using them as a springboard to support pydantic v2 for plugins?

jskrzypek avatar Aug 01 '24 21:08 jskrzypek

If anyone wants it, I pushed it up to my fork, and here's the diff from 0.15.50. I am going to try patching datahub on a fork to consume this version of great expectations, and see if that works for us.

jskrzypek avatar Aug 01 '24 23:08 jskrzypek