dbt-duckdb
dbt-duckdb copied to clipboard
Allow import from other python modules
Currently, it is not possible to import own packages. For example, I do have some transformations in some python modules but import the code from another package.
That is currently not possible, as a dbt-run does not find my own packages. Just as an example with a module. Just for the purpose of this screenshot I hacked some lines. But yes, the module exists and can also be executed locally when i use the python cli instead of dbt.
How can this be solved?
Ah so a couple of ways: one way is to create some sort of python environment for the dbt run-- Docker container, virtualenv, whatever floats your boat-- install your package inside of it, and then use it from dbt-duckdb.
Of course, that's a decent amount of work and kind of annoying if the logic inside of the custom module is changing frequently, so I added a poorly-documented profile setting called module_paths so that in your profile you could point dbt at a list of directories that you want added to sys.path
on startup so that you can use them in your Python models without having to go through all of that environment setup stuff.
Thanks for the update. I saw that setting, but was not able to get it up and running.
- The first solution might work but as our teams usually do lot of code changes, so this is not very practical.
- I went for solution number 2 and this is what I tried: I created a sample module in the directory
my_module
and then tried to import it but i still get this error. Do you have some working examples in the repository?
17:04:41 Runtime Error in model stg_article_with_categories (models/python/stg_article_with_categories.py) Python model failed: No module named 'my_module'
Which version of the project? And you did the import like import example
?
I should have said— I have an example of a project that used the module paths here: https://github.com/jwills/jaffle_shop_duckdb
Yes I did. What I did not do is to additionally register the module in the profiles.yaml. So to round it up, here comes my steps in case anyone needs to do it :)
How to use your own modules
In order to use your own python code in any python-model, a new plugin needs to be created and registered. At first, set the appropriate configuration in your profiles.yml
.
The configuration options which are important are module_paths
and plugins
. You can pass any number of directories as an array to module_paths
. For the plugins
, just use the filename of your module.
local_warehouse:
target: dev
outputs:
dev:
type: duckdb
path: db/warehouse.duckdb
threads: 24
module_paths:
- lib
plugins:
- module: my_module
This example would reflect the following file/directory structure
├── lib
│ └── my_module.py
For the Plugin, create a class which extends from the dbt.adapters.duckdb.plugins.BasePlugin
class. You need to ensure that the name of the function is unique for all registered functions!
from duckdb import DuckDBPyConnection
from dbt.adapters.duckdb.plugins import BasePlugin
from dbt.adapters.duckdb.utils import TargetConfig
def foo() -> int:
return 1729
# The python module that you create must have a class named "Plugin"
# which extends the `dbt.adapters.duckdb.plugins.BasePlugin` class.
class Plugin(BasePlugin):
def configure_connection(self, conn: DuckDBPyConnection):
conn.create_function("foo", foo)
The function can now be used in any model like this:
import my_module
def model(dbt, session):
print(my_module.foo())
Also another question: Can you return any value? As it seems more complex types like dicts
will result in this error:
20:32:02 Encountered an error:
Runtime Error
Invalid Input Error: Could not infer the return type, please set it explicitly
This would be the example:
def dict_test() -> any:
return {"key": "value"}
Mmm return from what? The return type of the model function needs to be something DuckDB can turn into a table— but for your own utility functions you should be able to return anything you want.