dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[Bug] dbt's custom exceptions inside a multiprocessing context hangs

Open keraion opened this issue 6 months ago • 3 comments

Is this a new bug in dbt-core?

  • [X] I believe this is a new bug in dbt-core
  • [X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

While debugging sqlfluff/sqlfluff#6037, dbt appears to hang if a dbt exception is raised. The exception appears to not be able to be pickled and prevents further execution.

Expected Behavior

The exceptions should implement __reduce__ to allow pickling and prevent hanging.

Steps To Reproduce

For these reproduction steps I'm using dbt-duckdb, but applies to all adapters.

  1. Using the example models, make the first model raise a compilation error:
--my_first_dbt_model.sql
SELECT * from {{ ref("abc") }}
  1. Call dbt run from a python multiprocessing context.
import multiprocessing as mp
from dbt.cli.main import cli

def run_dbt():
    ctx = cli.make_context(cli.name, ["run"])
    cli.invoke(ctx)

with mp.Pool() as pool:
    pool.apply(run_dbt)

Relevant log output

02:42:36  [WARNING]: Deprecated functionality

User config should be moved from the 'config' key in profiles.yml to the 'flags' key in dbt_project.yml.
02:42:36  Running with dbt=1.8.4
02:42:37  Registered adapter: duckdb=1.8.2
02:42:37  Unable to do partial parsing because of a version mismatch
02:42:38  Encountered an error:
Compilation Error
  Model 'model.test_dbt.my_first_dbt_model' (project2/models/example/my_first_dbt_model.sql) depends on a node named 'abc' which was not found
Exception in thread Thread-8 (_handle_results):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 579, in _handle_results
    task = get()
           ^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TargetNotFoundError.__init__() missing 3 required positional arguments: 'node', 'target_name', and 'target_kind'

Environment

- OS: Ubuntu 20.04
- Python: 3.11.9
- dbt: 1.8.4

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

As noted above, using dbt-duckdb The main entry point for this error will most likely be the sqlfluff-templater-dbt

In sqlfluff, monkeypatching __reduce__ prevents the process from hanging.

# sqlfluff_templater_dbt/templater.py
def _dbt_exception_reduce(self):
    return (
        type(self),
        tuple(
            getattr(self, arg)
            for arg in inspect.getfullargspec(self.__init__).args
            if arg != "self"
        ),
    )

DbtBaseException.__reduce__ = _dbt_exception_reduce

keraion avatar Aug 06 '24 03:08 keraion