odd-platform
odd-platform copied to clipboard
Data Entities with DEG does not show correct lineages
Describe the bug
A Data Entity that is both a DataSet and a DataEntityGroup loses it's lineage information regarding the dataset, and only have lineage from the DataEntityGroup.
Set up
ODD-Platform v0.15.0 (ghcr.io/opendatadiscovery/odd-platform:0.15.0)
Steps to Reproduce
There is a code to reproduce this behavior: https://gist.github.com/ghosalya/aa25b2903d3d5bf728a8b8aad9731cec
It uses odd-models-package to call the Ingestion API to create Data Entities
Steps to reproduce the behavior:
-
Have
odd-platformrunning at http://localhost:8080 (I followed this section of README.md for docker) -
Go to http://localhost:8080, and create a collector (
Management->Collectors->Add Collector). Export the toke as env variableODD_PLATFORM_TOKEN -
Install
odd-models-package -
Run odd_widget_example.py from the gist. This will create a number of entities e.g. WIDGET_TABLE
-
Go to http://localhost:8080, look for WIDGET_TABLE dataset and check the
Lineagetab. It should showwidget_job -> widget_tablelineage -
Now run odd_widget_example_deg.py, this will modify WIDGET_TABLE to have a DataEntityGroup component
-
Go to http://localhost:8080, look for WIDGET_TABLE dataset; it should have a DEG component like so
-
Go to WIDGET_TABLE's
Lineagetab
Expected behavior
The Lineage tab should still show widget_job -> widget_table
Current behavior
The Lineage tab is overridden by the DEG component and only shows the DEG members, and we lose the original lineage.
Additional context
The code to submit data entity list uses odd-models==2.0.31
Hey @ghosalya!
Firstly, thank you for opening this ticket and for the comprehensive description you've provided!
The issue you're encountering stems from the combination of a dataset and a DEG. In instances like these, the ODD Platform prioritizes the lineage of the DEG. Moreover, during metadata ingestion, ODD Platform doesn’t cross-check against these specific classes and permits the creation of such combinations.
For us to address this effectively, could you shed some light on the rationale behind designating an entity as both a dataset and a DEG simultaneously? It's essential for us to grasp the underlying intentions so we can determine the best path forward and ensure that creating a DEG and dataset within the same entity is indeed meaningful
Hi @DementevNikita
For us to address this effectively, could you shed some light on the rationale behind designating an entity as both a dataset and a DEG simultaneously? It's essential for us to grasp the underlying intentions so we can determine the best path forward and ensure that creating a DEG and dataset within the same entity is indeed meaningful
This is one of the workarounds we are trying with https://github.com/opendatadiscovery/odd-platform/issues/1407
Essentially, we want a DataEntity that is a DataSet (i.e. WIDGET_TABLE), but also has a component that lists the versions of this dataset (WIDGET_TABLE_V1, WIDGET_TABLE_V2). In this case, I would like the lineage of WIDGET_TABLE to derive from its DataSet lineage, since it is first and foremost a table.