dbt-databricks
dbt-databricks copied to clipboard
Can I use dbt-python-model load S3 data as a dataframe in Databricks?
Hello, our requirement is as follows, and we would like to seek your advice. We want to use DBT to connect to Databricks, the source data is on S3, and we want to use PySpark to load them into a Dataframe first(Like a ETL's 'E' part), do some transform on it, then write to Databricks table.
The code for PySpark looks something like this:
origin_df = spark.read.parquet(parquetFileList)
origin_df.createOrReplaceTempView("origin_table")
final_df = spark.sql("select ...... from origin_table ......")
Can we do these things in DBT python model?
I believe that as long as you return a dataframe, the dbt adapter will handle it. If your spark works in a Databricks notebook, I'd believe it should work with the adapter.
@benc-db Thanks for you reply, we will have a try!