Purview-ADB-Lineage-Solution-Accelerator
Purview-ADB-Lineage-Solution-Accelerator copied to clipboard
Lineage not getting displayed for all tables
Lineage became visible for a table on the first run. However, its no longer changing/updating after including additional notebooks tables. The code does a simple CTAS; CREATE TABLE <TABLE_NAME> USING DELTA AS SELECT * from <SOURCE_TABLE_NAME>
The source Table is in ADLS Gen2. The Target table is a managed table in DBFS (Databricks Default Database).
Expected behavior New Lineage information should show up in Purview Logs PurviewOut.log
In PurviewOut.log, there is an error: Information 2023-05-24 10:00:39.049 Error Loading to Purview JSON Entiitesto Purview: Return Code: BadRequest - Reason:Bad Request Error 2023-05-24 10:00:39.049 Purview Publish Entity Metadata Error : Error :{"requestId":"fc68faa4-73c4-4808-a77b-2fe96f65546e","errorCode":"ATLAS-400-00-036","errorMessage":"invalid relationshipDef: process_dataset_outputs: end type 1: databricks_process, end type 2: databricks_notebook"} Error 2023-05-24 10:00:40.128 Executed 'Functions.PurviewOut' (Succeeded, Id=0783df86-0011-480e-90c2-1c3660514b4d, Duration=4766ms) Information
Screenshots NA
Desktop (please complete the following information):
- OS: Windows
- OpenLineage Version: openlineage-spark-0.18.0.jar
- Databricks Runtime Version: 11.3
- Cluster Type: Interactive
- Cluster Mode: No Isolation Shared
- Using Credential Passthrough: No
Additional context The Lineage data showed up the first time. So the setup seems to be good. It seems there is a ATLAS error in the PurviewOut.logs
@mithun1979 it looks like the input is okay but the output is pointing to /user/hive/warehouse/test_call_center_schema_chnaged
and is not mapping correctly to hive metastore. It unfortunately is finding a databricks_notebook from the search results and trying to map the hive table to the first object that Purview search turns up as a match.
Support for Delta is limited but we will try to get better support for this.