dbt-databricks
dbt-databricks copied to clipboard
Can we pass spark config like "spark.executor.memory" in DBT model?
We want to do fine-grained tune on each model, can we pass these spark params to Databricks through DBT?
We do not support this; you might be able to set these params as part of python models or with a pre-hook using SET, but I've not tested either method.
Hi @benc-db Thank you very much! for your reply. Do we have plan to support this? because I think this is important for Spark task.
This is pretty far from what we are trying to accomplish with this library, and opens the door for many more hard to debug scenarios. Honestly, I don't see us supporting it officially any time soon. If a user submitted a pull request that could make this work across all of the compute types in a clean way, we may consider it.
Thanks got it! so this dbt-databricks adaptor is mainly focus on the Serverless cluster?
We're mainly focused on providing a consistent, reliable experience across SQL Warehouse (including serverless) and All-Purpose clusters. Providing this config capability is technically feasible, but as I'm the only one providing support for this library, I'm concerned about the amount of effort (including on-call/maintenance) as compared to the demand for the feature.
Thanks @benc-db , May I ask another question in this? Do you have plan to support job cluster? because I found that job cluster is much more cheaper than All-purpose cluster.
And if we have some feature really want to use, can we develop it by ourselves than ask you for review?
Job clusters are only supported for python models as they do not have a Thrift server on them (a pre-req for the way we execute SQL). And yes, I'm happy to review and merge user submitted code provided it is tested and doesn't break any existing workloads. For this particular feature, I see we have existing code where we SET a property on the cursor. We would need to generalize that capability/provide a config mechanism.
Thanks! you just mentioned we can use python models on Job cluster, but I did't find any doc about this, can you introduce us more?
Yeah, it is unofficially supported, as a.) we inherited the code, and b.) it's a completely separate path that isn't tested. However, enough people ask about it that at this point I think I'm stuck with it. Here's an article that may help: https://www.linkedin.com/pulse/dbt-databricks-part-2-working-python-models-newmathdata-wailc/
Thank you very much!
Hi @benc-db, I have tested the method you sent me, it is worked, we can submit model to job cluster. But I found that for each model, there will be a new job cluster, Can we put all models into one job cluster?