dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Feature request: Support Python UDF From DBT SQL Model

Open case-k-git opened this issue 11 months ago • 3 comments

Describe the feature

A clear and concise description of what you want to happen.

Using Spark UDF From DBT will be helpful.

As discussing in dbt-spark, Something like using pre_hook will be helpfull. https://github.com/dbt-labs/dbt-spark/issues/135#issuecomment-852920532

{{ config( 
pre_hook=['
function custom_df(input)
   # do some logic 
   return output

spark.udf.register('custom_df', custom_df)
']
) }}

select custom_df( x ) from {{ ref('my_table') }}

or

{{ config( 
pre_hook=['dbfs:/scripts/init_functions.py']
) }}

select custom_df( x ) from {{ ref('my_table') }}

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Using DBT Python models. Ideally want to use udf from DBT SQL model https://docs.getdbt.com/docs/build/python-models

(Advanced) Use dbt Python models in a workflow

https://docs.databricks.com/en/workflows/jobs/how-to/use-dbt-in-workflows.html#advanced-use-dbt-python-models-in-a-workflow

Additional context

Please include any other relevant context here.

Running dbt in production with Python UDFs https://www.explorium.ai/blog/news-and-updates/running-dbt-production-python-udfs/

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.

Some one who want to use UDF.In our company proceed migration from Oracle PL/SQL to Databricks. If we can use udf some function will be easy to migrate. https://www.databricks.com/blog/how-migrate-your-oracle-plsql-code-databricks-lakehouse-platform

Are you interested in contributing this feature?

Let us know if you want to write some code, and how we can help.

Yes

case-k-git avatar Mar 02 '24 06:03 case-k-git

Ah may be can we use databricks udf from macro?

Create UDF ※ we also able to register out side from dbt https://docs.databricks.com/en/udf/unity-catalog.html#register-a-python-udf-to-uc

{% macro greet(a) %}
  CREATE FUNCTION target_catalog.target_schema.greet(s STRING)
  RETURNS STRING
  LANGUAGE PYTHON
  AS $$
    return f"Hello, {s}"
  $$
{% endmacro %}

Read UDF Add macro function to call udf function.

{% macro get_ greet(a) %}
  `catalog.schema. greet `({{ a }})
{% endmacro %}

Use UDF use udf crom dbt sql

SELECT
  {{
    get_ greet(
      a="Jone"
    )
  }} AS amount
FROM {{ ref('table') }}

case-k-git avatar Mar 04 '24 15:03 case-k-git

This might already be possible though I'm not sure

Here is how I've done it with DuckDB

I define a plugin here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7/udf/init.py and then load it in the profile here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7_dbt/profiles.yml#L13 And finally can use it eg here https://github.com/Nintorac/s4_dx7/blob/main/s4_dx7_dbt/models/dx7_voices.sql#L3C25-L3C39

what I'm not clear on is

  • how to ship the environment to the databricks cluster
  • are the plugins a duckdb-dbt feature or dbt-core feature

Nintorac avatar Apr 10 '24 22:04 Nintorac