Purview-ADB-Lineage-Solution-Accelerator icon indicating copy to clipboard operation
Purview-ADB-Lineage-Solution-Accelerator copied to clipboard

Support for Delta Live Tables

Open gerson23 opened this issue 1 year ago • 1 comments

Describe the feature In our project, we are starting to use Delta Live Tables and we need to publish the lineage information to Purview. However, the solution accelerator doesn't seem to support DLT pipelines yet.

Detailed Example Ideally, we should be able to see the DLT pipeline lineage in Purview, connecting the assets DLT is reading from up to the assets created as an output of the pipeline.

flowchart LR;
  raw["raw asset"];
  bronze_table["bronze table"];
  silver_table["silver table"];
  gold_table["gold table"];
  asset["aggregated asset"];
  raw --> bronze_table;
  subgraph DLT Pipeline
  bronze_table --> silver_table;
  silver_table --> gold_table;
  end
  gold_table --> asset;

Another simpler option would be to have lineage information similar to notebooks, hiding the internals of the pipeline.

flowchart LR
  raw["raw asset"]
  dlt["DLT Pipeline"]
  asset["aggregate asset"]
  raw --> dlt
  dlt --> asset

Issues that this feature solves N/A

Suggested Implementation Not sure how to implement this. I know that, if going this way, the pipeline information itself is available in the event logs: https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/observability#--query-lineage-information-from-the-event-log

Additional context N/A

gerson23 avatar Jul 27 '23 15:07 gerson23

@gerson23 thank you so much for this great suggestion! I would love to be able to cover Delta Live Tables but we need OpenLineage to support it. In OpenLineage/OpenLineage#372 there is a desire to cover Spark streaming jobs but I know the OpenLineage community needs more support in this area.

wjohnson avatar Dec 30 '23 16:12 wjohnson