dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[Feature] Filter standard library packages out of Python models' `packages` config

Open gwenwindflower opened this issue 1 year ago • 4 comments

Is this your first time submitting a feature request?

  • [X] I have read the expectations for open source contributors
  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Right now, if a user wants to use re, os, etc in a Python model, they would rightfully think it important to add it to the packages list config argument of the model. In fact, dbt will throw a 'package not found' error for packages that aren't 3rd party. The Right Way at present is to just import and use them, but we don't flag that anywhere in the docs. It would be good to filter out the standard library packages and perhaps throw a warning instead of an error here, letting people know this isn't necessary, but still proceeding.

At present you need to do this, which is not super obvious:

import pandas as pd
import numpy as np
import re

def model(dbt, session):
    # dbt configuration
    dbt.config(packages=["pandas","numpy"])

Describe alternatives you've considered

  • Updating the docs to make this more clear
  • Throwing a clearer error
  • Filtering the packages and not throwing a warning at all, just ignoring the extra code

Who will this benefit?

Users of Python models.

Are you interested in contributing this feature?

No

Anything else?

gwenwindflower avatar Apr 08 '24 17:04 gwenwindflower

Thanks for opening this @gwenwindflower !

Which adapter did you use? Could you provide a simple dbt python model that exhibits this issue?

Was it dbt-snowflake with a model like this, by any chance?

import pandas as pd
import numpy as np
import re

def model(dbt, session):
    dbt.config(packages=["pandas", "numpy", "re"])

    df = pd.DataFrame({"hello": ["world"]})
    return df

And an error like this?

00:23:57    Database Error in model my_python_model (models/my_python_model.py)
  100357 (P0000): Cannot create a Python function with the specified packages. Please check your packages specification and try again.
  compiled Code at target/run/my_project/models/my_python_model.py

dbeatty10 avatar Apr 09 '24 00:04 dbeatty10

hey @dbeatty10, sorry for the lack of a firsthand repro, I reported this based on a user in the Community so didn't get the error myself! @aranke suggested it could be worthwhile to just fix this rather than updating the docs, and I tend to agree, particularly with the offered idea of a clear Warning over a mysterious Error. based on my conversation with the Community-member, this looks like exactly the simplified version of the model he was creating and error he was getting that confused him. Here's a link to the thread.

gwenwindflower avatar Apr 09 '24 12:04 gwenwindflower

@aranke could you share the details of your proposed approach for this scenario?

If you can provide links to the relevant area(s) of the source code, that would be even better.

dbeatty10 avatar Apr 10 '24 01:04 dbeatty10

Code: TK

Python built-in modules: https://docs.python.org/3/library/sys.html#sys.builtin_module_names

aranke avatar Apr 15 '24 14:04 aranke