dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Python model serverless workflow doesn't accept environment libraries

Open dustinvannoy-db opened this issue 8 months ago • 9 comments

Describe the bug

I am testing Python Models submitted to serverless job compute. I was trying two submission methods: workflow_job and serverless_cluster. I need to add a library dependency but can not get it to work with workflow_job.

Steps To Reproduce

Model code that is attempting to use a library that needs installed from PyPI while running on serverless job compute.

from faker import Faker

def model(dbt, session):
    dbt.config(
        submission_method='workflow_job',
        environment_key="my_env",
        environment_dependencies=["faker==37.0.2"])

    my_sql_model_df = dbt.ref("CustomerIncremental")

    fake = Faker()
    print(fake.name())

    final_df = my_sql_model_df.selectExpr("*").limit(100)

    return final_df

Response: Runtime Error in model DimCustomer3 (Databricks/models/main/python/DimCustomer3.py) Python model failed with traceback as: (Note that the line number here does not match the line number in your code due to dbt templating) ModuleNotFoundError: No module named 'faker'

Expected behavior

If I use this code with serverless_cluster method it works. It should be the same for workflow_job.

from faker import Faker

def model(dbt, session):
    dbt.config(
        submission_method='serverless_cluster',
        environment_key="my_env",
        environment_dependencies=["faker==37.0.2"])

    my_sql_model_df = dbt.ref("CustomerIncremental")

    fake = Faker()
    print(fake.name())

    final_df = my_sql_model_df.selectExpr("*").limit(100)

    return final_df

Response: 04:18:26 Finished running 1 table model in 0 hours 1 minutes and 55.52 seconds (115.52s). 04:18:26
04:18:26 Completed successfully 04:18:26
04:18:26 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

serverless_cluster working environment: Image

workflow_job no environment shown: Image

System information

The output of dbt --version:

Core:
  - installed: 1.9.3
  - latest:    1.9.4 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - databricks: 1.9.7 - Update available!
  - spark:      1.8.0 - Update available!

The operating system you're using: MacOS Sequoia 15.4

The output of python --version: Python 3.10.17

Additional context

Add any other context about the problem here.

dustinvannoy-db avatar May 02 '25 04:05 dustinvannoy-db