Purview-ADB-Lineage-Solution-Accelerator icon indicating copy to clipboard operation
Purview-ADB-Lineage-Solution-Accelerator copied to clipboard

Lineage is not published to Purview - START events missing environment-properties

Open gerson23 opened this issue 10 months ago • 0 comments

Describe the bug START events are skipped in OpenLineageIn function because they are missing the environment-properties field. This causes COMPLETE events to not be processed by PurviewOut function, therefore no lineage is published to Purview.

However, as discussed in https://github.com/OpenLineage/OpenLineage/issues/2203, this can be considered a valid scenario, because OL model is cumulative, so the following RUNNING event should have the environment-properties information on top of that.

Expected behavior START events, even when missing environment-properties field should be accepted. RUNNING events should be accepted as well and used to fill the information from environment-properties when they have it. Then, a COMPLETE event can be properly processed and lineage be published to Purview.

Environment

  • OpenLineage Version: OL 1.5.0+ (I changed parameter for the newer version)
  • Databricks Runtime Version: 14.3
  • Cluster Type: Job

gerson23 avatar Apr 04 '24 15:04 gerson23