dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Can we pass spark config like "spark.executor.memory" in DBT model?

Open gaoshihang opened this issue 1 year ago • 10 comments
trafficstars

We want to do fine-grained tune on each model, can we pass these spark params to Databricks through DBT?

gaoshihang avatar Apr 08 '24 01:04 gaoshihang

We do not support this; you might be able to set these params as part of python models or with a pre-hook using SET, but I've not tested either method.

benc-db avatar Apr 08 '24 17:04 benc-db

Hi @benc-db Thank you very much! for your reply. Do we have plan to support this? because I think this is important for Spark task.

gaoshihang avatar Apr 08 '24 17:04 gaoshihang

This is pretty far from what we are trying to accomplish with this library, and opens the door for many more hard to debug scenarios. Honestly, I don't see us supporting it officially any time soon. If a user submitted a pull request that could make this work across all of the compute types in a clean way, we may consider it.

benc-db avatar Apr 08 '24 17:04 benc-db

Thanks got it! so this dbt-databricks adaptor is mainly focus on the Serverless cluster?

gaoshihang avatar Apr 08 '24 17:04 gaoshihang

We're mainly focused on providing a consistent, reliable experience across SQL Warehouse (including serverless) and All-Purpose clusters. Providing this config capability is technically feasible, but as I'm the only one providing support for this library, I'm concerned about the amount of effort (including on-call/maintenance) as compared to the demand for the feature.

benc-db avatar Apr 08 '24 17:04 benc-db

Thanks @benc-db , May I ask another question in this? Do you have plan to support job cluster? because I found that job cluster is much more cheaper than All-purpose cluster.

And if we have some feature really want to use, can we develop it by ourselves than ask you for review?

gaoshihang avatar Apr 08 '24 18:04 gaoshihang

Job clusters are only supported for python models as they do not have a Thrift server on them (a pre-req for the way we execute SQL). And yes, I'm happy to review and merge user submitted code provided it is tested and doesn't break any existing workloads. For this particular feature, I see we have existing code where we SET a property on the cursor. We would need to generalize that capability/provide a config mechanism.

benc-db avatar Apr 08 '24 18:04 benc-db

Thanks! you just mentioned we can use python models on Job cluster, but I did't find any doc about this, can you introduce us more?

gaoshihang avatar Apr 08 '24 18:04 gaoshihang

Yeah, it is unofficially supported, as a.) we inherited the code, and b.) it's a completely separate path that isn't tested. However, enough people ask about it that at this point I think I'm stuck with it. Here's an article that may help: https://www.linkedin.com/pulse/dbt-databricks-part-2-working-python-models-newmathdata-wailc/

benc-db avatar Apr 08 '24 18:04 benc-db

Thank you very much!

gaoshihang avatar Apr 08 '24 18:04 gaoshihang

Hi @benc-db, I have tested the method you sent me, it is worked, we can submit model to job cluster. But I found that for each model, there will be a new job cluster, Can we put all models into one job cluster?

gaoshihang avatar Apr 12 '24 18:04 gaoshihang