spark-atlas-connector
spark-atlas-connector copied to clipboard
Temporary tables stored in Atlas
Currently the Spark Atlas Connector reports temporary thrift tables to Atlas as spark_table
. Below you can find an example lineage report. The questions we have about these temporary tables:
- Why does the spark-atlas-connector report temporary tables to Atlas in the first place?
- If there is a good reason to have the temporary tables reported:
- why are the entities not reported as deleted?
- why is there not an attribute describing that the table is deleted?
- why is the table type MANAGED? as it is temporary
Example lineage reported:
createTime: 1553897909000
database: DBNAME
description: [empty]
lastAccessTime: 0
name: o_TABLENAME_xref_20190328
owner: [owner of task]
paritionColumnNames: [empty]
properties: transient_lastDdlTime: 1553897909, bucketing_version: 2
provider: parquet
qualifiedName: thrift://node1:9083,thrift://node2:9083,thrift://node3:9083.DBNAME.o_TABLENAME_xref_20190328
schema: [empty]
storage: thrift://node1:9083,thrift://node2:9083,thrift://node3:9083.DBNAME.o_TABLENAME_xref_20190328.storageFormat
tableType: MANAGED
unsupportedFeatures: [empty]
Thanks for reporting! It would be pretty helpful if you contain step to reproduce too, as well as which branch/commit do you use to reproduce issue.
SAC is a kind of "moving one" and we haven't plan on official releases: so if it doesn't reproduce in current master, we may not address it to previous version/branch.
Thanks again!