[Feature] Filter standard library packages out of Python models' `packages` config
Is this your first time submitting a feature request?
- [X] I have read the expectations for open source contributors
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Right now, if a user wants to use re, os, etc in a Python model, they would rightfully think it important to add it to the packages list config argument of the model. In fact, dbt will throw a 'package not found' error for packages that aren't 3rd party. The Right Way at present is to just import and use them, but we don't flag that anywhere in the docs. It would be good to filter out the standard library packages and perhaps throw a warning instead of an error here, letting people know this isn't necessary, but still proceeding.
At present you need to do this, which is not super obvious:
import pandas as pd
import numpy as np
import re
def model(dbt, session):
# dbt configuration
dbt.config(packages=["pandas","numpy"])
Describe alternatives you've considered
- Updating the docs to make this more clear
- Throwing a clearer error
- Filtering the packages and not throwing a warning at all, just ignoring the extra code
Who will this benefit?
Users of Python models.
Are you interested in contributing this feature?
No
Anything else?
Thanks for opening this @gwenwindflower !
Which adapter did you use? Could you provide a simple dbt python model that exhibits this issue?
Was it dbt-snowflake with a model like this, by any chance?
import pandas as pd
import numpy as np
import re
def model(dbt, session):
dbt.config(packages=["pandas", "numpy", "re"])
df = pd.DataFrame({"hello": ["world"]})
return df
And an error like this?
00:23:57 Database Error in model my_python_model (models/my_python_model.py)
100357 (P0000): Cannot create a Python function with the specified packages. Please check your packages specification and try again.
compiled Code at target/run/my_project/models/my_python_model.py
hey @dbeatty10, sorry for the lack of a firsthand repro, I reported this based on a user in the Community so didn't get the error myself! @aranke suggested it could be worthwhile to just fix this rather than updating the docs, and I tend to agree, particularly with the offered idea of a clear Warning over a mysterious Error. based on my conversation with the Community-member, this looks like exactly the simplified version of the model he was creating and error he was getting that confused him. Here's a link to the thread.
@aranke could you share the details of your proposed approach for this scenario?
If you can provide links to the relevant area(s) of the source code, that would be even better.
Code: TK
Python built-in modules: https://docs.python.org/3/library/sys.html#sys.builtin_module_names