dbt-databricks
dbt-databricks copied to clipboard
Feature request: Support Python UDF From DBT SQL Model
Describe the feature
A clear and concise description of what you want to happen.
Using Spark UDF From DBT will be helpful.
As discussing in dbt-spark, Something like using pre_hook will be helpfull. https://github.com/dbt-labs/dbt-spark/issues/135#issuecomment-852920532
{{ config(
pre_hook=['
function custom_df(input)
# do some logic
return output
spark.udf.register('custom_df', custom_df)
']
) }}
select custom_df( x ) from {{ ref('my_table') }}
or
{{ config(
pre_hook=['dbfs:/scripts/init_functions.py']
) }}
select custom_df( x ) from {{ ref('my_table') }}
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Using DBT Python models. Ideally want to use udf from DBT SQL model https://docs.getdbt.com/docs/build/python-models
(Advanced) Use dbt Python models in a workflow
https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-use-dbt-python-models-in-a-workflow
Additional context
Please include any other relevant context here.
Running dbt in production with Python UDFs https://www.explorium.ai/blog/news-and-updates/running-dbt-production-python-udfs/
Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
Some one who want to use UDF.In our company proceed migration from Oracle PL/SQL to Databricks. If we can use udf some function will be easy to migrate. https://www.databricks.com/blog/how-migrate-your-oracle-plsql-code-databricks-lakehouse-platform
Are you interested in contributing this feature?
Let us know if you want to write some code, and how we can help.
Yes
Ah may be can we use databricks udf from macro?
Create UDF ※ we also able to register out side from dbt https://docs.databricks.com/en/udf/unity-catalog.html#register-a-python-udf-to-uc
{% macro greet(a) %}
CREATE FUNCTION target_catalog.target_schema.greet(s STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
return f"Hello, {s}"
$$
{% endmacro %}
Read UDF Add macro function to call udf function.
{% macro get_ greet(a) %}
`catalog.schema. greet `({{ a }})
{% endmacro %}
Use UDF use udf crom dbt sql
SELECT
{{
get_ greet(
a="Jone"
)
}} AS amount
FROM {{ ref('table') }}
This might already be possible though I'm not sure
Here is how I've done it with DuckDB
I define a plugin here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7/udf/init.py and then load it in the profile here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7_dbt/profiles.yml#L13 And finally can use it eg here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7_dbt/models/dx7_voices.sql#L3C25-L3C39
what I'm not clear on is
- how to ship the environment to the databricks cluster
- are the plugins a duckdb-dbt feature or dbt-core feature