spline-spark-agent icon indicating copy to clipboard operation
spline-spark-agent copied to clipboard

Checkpoint lineage support

Open tcluzhe opened this issue 2 years ago • 1 comments

code like this, while spline cannot get Input Data Source

from pyspark.sql import SparkSession


spark = (SparkSession.builder
        .config('spark.sql.queryExecutionListeners', 'za.co.absa.spline.harvester.listener.SplineQueryExecutionListener')
        .config('spark.spline.producer.url', 'http://master-1-1:8080/producer')
        .enableHiveSupport()
        .getOrCreate()
    )


def generate_data():
    data = [
            ('a', 1),
            ('b', 2),
            ]

    df = spark.createDataFrame(data, ['name', 'value'])
    df.write.saveAsTable('test.table', mode='overwrite')


def test():
    spark.sparkContext.setCheckpointDir('/tmp/checkpoint')
    df = spark.table('test.table')
    df = df.checkpoint()     ## checkpoint
    df.write.saveAsTable('test.table2', mode='overwrite')


if __name__ == '__main__':
    # generate_data()
    test()

image

tcluzhe avatar Dec 01 '22 06:12 tcluzhe

+1 we would definitely appreciate this feature too (either way we're loosing a whole chunk of lineage)

katerina-glushko avatar Apr 01 '24 14:04 katerina-glushko