dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-3079] [Bug] When file is prefixed with dbt_ in the file name it will run the python script twice

Open njrs92 opened this issue 2 years ago • 4 comments

Is this a new bug in dbt-core?

  • [X] I believe this is a new bug in dbt-core
  • [X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

This is a pretty edge case but if a file is named dbt_somthing it will run twice

from dbt.cli.main import dbtRunner, dbtRunnerResult
db_cmd = dbtRunner()
res: dbtRunnerResult = db_cmd.invoke(["build", "--select", "nothing"])
print("This should only be seen once")

Expected Behavior

the print statement to only run once

Steps To Reproduce

Have a working dbt project Run dbt with python in a file with the name dbt_something.py

from dbt.cli.main import dbtRunner, dbtRunnerResult
db_cmd = dbtRunner()
res: dbtRunnerResult = db_cmd.invoke(["build", "--select", "nothing"])
print("This should only be seen once")

Relevant log output

05:17:39  Running with dbt=1.6.1
05:17:39  Running with dbt=1.6.1
05:17:40  Registered adapter: redshift=1.6.1
05:17:40  Found 184 models, 34 analyses, 10 seeds, 2 operations, 372 tests, 108 sources, 10 exposures, 0 metrics, 1176 macros, 0 groups, 0 semantic models
05:17:40  The selection criterion 'nothing' does not match any nodes
05:17:40
05:17:40  Nothing to do. Try checking your model configs and model specification args
This should only be seen once
05:17:41  Registered adapter: redshift=1.6.1
05:17:42  Found 184 models, 34 analyses, 10 seeds, 2 operations, 372 tests, 108 sources, 10 exposures, 0 metrics, 1176 macros, 0 groups, 0 semantic models
05:17:42  The selection criterion 'nothing' does not match any nodes
05:17:42
05:17:42  Nothing to do. Try checking your model configs and model specification args
This should only be seen once

Environment

- OS: windows 10
- Python: 3.10.11
- dbt: 1.6.1

Which database adapter are you using with dbt?

redshift

Additional Context

No response

njrs92 avatar Sep 06 '23 05:09 njrs92

Thanks for reporting this @njrs92 !

Yes, I see what you are saying about a programmatic invocations Python script executing twice when the name of the script starts with dbt_.

It might have something to do with this: https://github.com/dbt-labs/dbt-core/blob/7e2a08f3a5a873d37d7e9de1ada935a5d78c3b22/core/dbt/plugins/manager.py#L67

We'd welcome a PR to fix this, and we've marked this as help_wanted accordingly. In the meantime, we'd recommend not having any Python scripts that start with dbt_ in the current working directory.

See below for the reproducible example that I used.

Reprex

Set up all the project files:

cat <<EOF >>runner.py
from dbt.cli.main import dbtRunner, dbtRunnerResult


db_cmd = dbtRunner()
res: dbtRunnerResult = db_cmd.invoke(["build", "--select", "nothing"])

print(f"Running this script: {__file__}")

EOF

cat <<EOF >>dbt_project.yml
name: "my_project"
version: "1.0.0"
config-version: 2
profile: "sandcastle"

clean-targets:
  - target
  - dbt_packages
  - logs

EOF

cat <<EOF >>profiles.yml
sandcastle:
  target: duckdb
  outputs:
    duckdb:
      type: duckdb
      path: 'db.db'

EOF

Create the virtual environment and install dbt-duckdb:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade --pre dbt-duckdb~=1.6.0 dbt-core~=1.6.0
source .venv/bin/activate
dbt --version

✅ This works fine:

python runner.py

✅ Also works fine:

mv runner.py runner_dbt.py
python runner_dbt.py

❌ This runs twice:

mv runner_dbt.py dbt_runner.py
python dbt_runner.py

✅ Works fine again:

mv dbt_runner.py runner.py
python runner.py

❌ Runs twice:

cp runner.py dbt_runner.py
python runner.py

Deactivate the virtual environment afterwards:

deactivate

dbeatty10 avatar Sep 06 '23 20:09 dbeatty10

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Feb 14 '25 01:02 github-actions[bot]

Do we have any more guidance on how to work around this issue? It seems a security issue to have DBT run any file that starts with 'dbt_'.

I'm commenting to try to keep this open, as this could cause a lot of problems due to DBT running with elevated privileges to the data.

I can also poke about and see if I can find an efficient fix, but I'm new to the codebase and so it's a hope more than a plan :)

matthew-paul-1024 avatar Mar 20 '25 15:03 matthew-paul-1024

The problem seems to be here: https://github.com/dbt-labs/dbt-core/blob/7e2a08f3a5a873d37d7e9de1ada935a5d78c3b22/core/dbt/plugins/manager.py#L96

Because all python scripts (*.py modules) in sys.path are loaded. The path of pkgutil.iter_modules could be set to dbt_packages/ (see here), but this would prevent adding custom modules to dbt. I am not sure what the expected or intended behavior is.

I think @matthew-paul-1024 is right, running any python script in sys.path could be a security risk. However, my proposed solution would not solve this. Taking security for plugins seriously would probably require the use of some kind of plugin signatures and a registry.

TomAtGithub avatar Jun 02 '25 14:06 TomAtGithub