datahub
datahub copied to clipboard
In the case of temp table, the redshift Lineage cannot be obtained accurately.
Describe the bug In the case of temp table, the redshift Lineage cannot be obtained accurately.
To Reproduce
DROP TABLE IF EXISTS tmp_table;
create temp table tmp_table as
select
a,
b
from tableAA
insert into tableBB select * from tmp_table
tableBB Lineage,not exist tmp_table、tableAA
Expected behavior tableBB Lineage,exist tmp_table、tableAA
https://github.com/datahub-project/datahub/blob/075d19ef166177ececfbb39796de4721bdde9dc1/metadata-ingestion/src/datahub/ingestion/source/sql/redshift.py#L798-L853
The possible reason is that SVV_TABLE_INFO Do not permanently save temporary table information,Only including temporary tables created by a user for the current session. I can't find all the places where the temporary tables are redshift stored, and I may need to parse the consanguinity through ddl.
Currently, we drop tables from a lineage that does not exist anymore. Possible solution can be to resolve those connections where A -> TempB -> C to A -> C if TempB does not exists anymore.
This was fixed by https://github.com/datahub-project/datahub/pull/9704