Purview-ADB-Lineage-Solution-Accelerator
Purview-ADB-Lineage-Solution-Accelerator copied to clipboard
Support for Delta Live Tables
Describe the feature In our project, we are starting to use Delta Live Tables and we need to publish the lineage information to Purview. However, the solution accelerator doesn't seem to support DLT pipelines yet.
Detailed Example Ideally, we should be able to see the DLT pipeline lineage in Purview, connecting the assets DLT is reading from up to the assets created as an output of the pipeline.
flowchart LR;
raw["raw asset"];
bronze_table["bronze table"];
silver_table["silver table"];
gold_table["gold table"];
asset["aggregated asset"];
raw --> bronze_table;
subgraph DLT Pipeline
bronze_table --> silver_table;
silver_table --> gold_table;
end
gold_table --> asset;
Another simpler option would be to have lineage information similar to notebooks, hiding the internals of the pipeline.
flowchart LR
raw["raw asset"]
dlt["DLT Pipeline"]
asset["aggregate asset"]
raw --> dlt
dlt --> asset
Issues that this feature solves N/A
Suggested Implementation Not sure how to implement this. I know that, if going this way, the pipeline information itself is available in the event logs: https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/observability#--query-lineage-information-from-the-event-log
Additional context N/A
@gerson23 thank you so much for this great suggestion! I would love to be able to cover Delta Live Tables but we need OpenLineage to support it. In OpenLineage/OpenLineage#372 there is a desire to cover Spark streaming jobs but I know the OpenLineage community needs more support in this area.