spline-spark-agent
spline-spark-agent copied to clipboard
Spline Support of Statment Create table as select
I have a statment in a Notebook in Databricks that is
create table
lineage_data.lineagedemo.dinner_1
AS
SELECT
recipe_id, concat(app," + ", main," + ",dessert)
AS
full_menu
FROM
lineage_data.lineagedemo.menu
the lineage is not shown
Only on insert into - it shows?
Is it by design
I use version
spark_3_3_spline_agent_bundle_2_12_2_1_0_SNAPSHOT.jar
It should be supported. Is there any error in logs? Try "USING DELTA" (not sure if it's going to make any difference, but we use it like that).
Hi i have run this code
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.functions import col
# Assuming SparkSession is already available as 'spark' in Databricks
# spark = SparkSession.builder.appName("example").getOrCreate()
# Define the schema for the table
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
# Create a DataFrame with the schema
data = [(1, "Alice", 30), (2, "Bob", 35), (3, "Charlie", 40)]
df = spark.createDataFrame(data, schema)
# Define the table name and path (replace with your desired table name and path)
silver_table_name = "silver_table_name1"
path = "/mnt/delta/" + silver_table_name
# Write the DataFrame to the table
df.write.format("delta").mode("overwrite").save(path)
# Register the table in the metastore
spark.sql(f"CREATE TABLE {silver_table_name} USING DELTA LOCATION '{path}'")
# Create a materialized view from the silver layer table
materialized_view_name = "your_materialized_view"
create_mv_sql = f"""
CREATE VIEW {materialized_view_name}
AS SELECT * FROM {silver_table_name}
"""
spark.sql(create_mv_sql)
# Optionally, verify the contents of the table and materialized view
spark.sql(f"SELECT * FROM {silver_table_name}").show()
spark.sql(f"SELECT * FROM {materialized_view_name}").show()
and according to the execution plan i got in spline
it shows only target /mnt/delta/silver_table_name1
with columns
seems that the create view statemet is not registered in the lineage
It should be supported. Is there any error in logs? Try "USING DELTA" (not sure if it's going to make any difference, but we use it like that).
@wajda also when i added Using delta is didnt solved it
-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS lineage_data.lineagedemo.Accounts1;
-- Drop the table 'V_Accounts1' if it exists
DROP TABLE IF EXISTS lineage_data.lineagedemo.V_Accounts1;
-- Creating the table 'Accounts1' using Delta format
CREATE TABLE lineage_data.lineagedemo.Accounts1(
Id INT,
Name STRING
) USING DELTA;
-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE TABLE lineage_data.lineagedemo.V_Accounts1
USING DELTA AS
SELECT *
FROM lineage_data.lineagedemo.Accounts1;
@wajda if i add Using delta it works also in create as select
-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS Accounts1;
-- Drop the table 'V_Accounts1' if it exists
DROP TABLE IF EXISTS V_Accounts1;
-- Creating the table 'Accounts1' using Delta format
CREATE TABLE Accounts1(
Id INT,
Name STRING
) USING DELTA;
-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE TABLE V_Accounts1
USING DELTA AS
SELECT *
FROM Accounts1;
It might be possible that CREATE VIEW is either not supported by Spline, or is not trackable at all in Spark. A more thorough investigation is needed. Try to enable DEBUG logging in Spark and see if there is a message like
XXX was not recognized as a write-command
or Write extraction failed following an object structure dump.
Another small thing to try is to enable non persistent actions capturing like this:
spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true
If you still do no see any mentioning of your "VIEW" anywhere - in logs, errors, or lineage - then it's very likely that Spark simply do not provide the information about that action at the first place. Please try and let us know what you've found.
I will try spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true But how can enable debug logging on Databricks?
I have added spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true to the Advanced Setting it caused additional of execution plans- but none of them showed the view and the columns @wajda - Do you know how can i enable DEBUG logging in Databricks?
By the way,Unity Catalog of Databricks support create view as select * from table
Do you know how can i enable DEBUG logging in Databricks?
spark.sparkContext.setLogLevel("DEBUG")
I put on the head of my Databricks notebook
%python
spark.sparkContext.setLogLevel("debug")
%python
sc._jvm.za.co.absa.spline.harvester.SparkLineageInitializer.enableLineageTracking(spark._jsparkSession)
%SQL
-- Drop the table 'Accounts1' if it exists
DROP TABLE IF EXISTS Accounts1;
-- Drop the table 'V_Accounts1' if it exists
-- Creating the table 'Accounts1' using Delta format
CREATE TABLE Accounts1(
Id INT,
Name STRING
) USING DELTA;
-- Creating or replacing the table 'V_Accounts1' based on the 'Accounts1' table, using Delta format
CREATE OR REPLACE view V_Accounts1
AS
SELECT *
FROM Accounts1;
But dont see any error on the driver logs
@wajda I faced the similar issue and was able to see the plan. Although I don't have details right now to see what spline is generating for CTAS, I identified that because of below change in Spark caused this issue. Although this change is applicable starting Spark 3.3, usually databricks ports back changes to older version as well. This was true for ListQuery expression as well I fixed earlier in Spline.
Need to extract the logical plan class correctly for Spline to work
https://github.com/apache/spark/commit/aa1b16b1acd67c22544f969eb8595b02bbc1b433#diff-594addaa7ed3dd43fd0dad5fa81ade5ec2e570adf7c982c0570e3358a08a8e0a