dbt-spark icon indicating copy to clipboard operation
dbt-spark copied to clipboard

[CT-983] Support dbt Python models in OSS Apache Spark

Open jtcohen6 opened this issue 3 years ago • 5 comments

Context: https://github.com/dbt-labs/dbt-spark/discussions/407

The current implementation depends on Databricks APIs that are not available in OSS Apache Spark. We would like help from knowledgeable and interested community members, who could spec out an implementation using Spark-only functionality.

The entry point is submit_python_job:

https://github.com/dbt-labs/dbt-spark/blob/7f6cffecf38b7c41aa441eb020d464ba1e20bf9e/dbt/adapters/spark/impl.py#L392

potentially useful Spark doc: Submitting Applications

jtcohen6 avatar Aug 03 '22 12:08 jtcohen6

let us know if you'd like to help on this issue!

lostmygithubaccount avatar Nov 01 '22 17:11 lostmygithubaccount

Hi Cody, thanks for reaching out. Yes, I would like to help, but my current time available is really strechted. Regards

Sebastian

Am Di., 1. Nov. 2022 um 18:24 Uhr schrieb Cody Peterson < @.***>:

let us know if you'd like to help on this issue!

— Reply to this email directly, view it on GitHub https://github.com/dbt-labs/dbt-spark/issues/415#issuecomment-1298865049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWEUZAANPKZONHDPDPMJMTWGFG6PANCNFSM55OXUO6Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Waltherr avatar Nov 03 '22 07:11 Waltherr

Hi, is there any update or current plan on this?

Adricu8 avatar Feb 28 '23 14:02 Adricu8

Would like to vouch that this will be an important features, and open dbt up to a different data engineers who mostly work with Spark but at the same time wanted the rigor and data quality framework of dbt.

huydeelll avatar Nov 21 '23 04:11 huydeelll

I have some non-production grady sample that uses the same approach as duckdb to run python models.. https://github.com/timvw/dbt-spark/tree/support-sparksession-python-local

timvw avatar Nov 21 '23 07:11 timvw