dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

dbt-databricks tries to establish connection to Databricks when running `dbt parse`

Open ghjklw opened this issue 1 year ago • 5 comments

Describe the bug

Running dbt parse should, according to dbt documentation, work in an isolated environment with no Databricks workspace available. dbt parse documentation

Starting in v1.5, dbt parse will write or return a manifest, enabling you to introspect dbt's understanding of all the resources in your project. Since dbt parse doesn't connect to your warehouse, this manifest will not contain any compiled code.

This is especially useful when building CI/CD pipeline where you want to be able to generate a manifest.json file.

This used to work as expected with dbt-databricks, but it seems to have been broken in version 1.9.0.

Steps To Reproduce

Define some dummy http_path (or other) value in dbt profile and run dbt parse. Using dbt-databricks 1.8.7, that works as expected, whereas any version since 1.9.0 produces an unhandled exception.

Expected behavior

Generate a manifest.json without trying to connect to Databricks.

Screenshots and log output

Here is part of the traceback you get when running dbt parse

13:43:50  Traceback (most recent call last):
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 138, in wrapper
    result, success = func(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 101, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 218, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 247, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 294, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/cli/requires.py", line 320, in wrapper
    ctx.obj["manifest"] = parse_manifest(
                          ^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 1895, in parse_manifest
    register_adapter(runtime_config, get_mp_context())
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/factory.py", line 203, in register_adapter
    FACTORY.register_adapter(config, mp_context, adapter_registered_log_level)
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/factory.py", line 118, in register_adapter
    adapter: Adapter = adapter_type(config, mp_context)  # type: ignore
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/impl.py", line 176, in __init__
    super().__init__(config, mp_context)
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/base/impl.py", line 271, in __init__
    self.connections = self.ConnectionManager(config, mp_context)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/connections.py", line 712, in __init__
    super().__init__(profile, mp_context)
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/connections.py", line 385, in __init__
    self.api_client = DatabricksApiClient.create(creds, 15 * 60)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/api_client.py", line 560, in create
    credentials_provider = credentials.authenticate(None)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/dbt/.venv/lib/python3.12/site-packages/dbt/adapters/databricks/credentials.py", line 262, in authenticate
    self._credentials_provider = provider.as_dict()

System information

The output of dbt --version:

Core:
  - installed: 1.9.2
  - latest:    1.9.2 - Up to date!

Plugins:
  - spark:      1.9.1 - Up to date!
  - databricks: 1.9.4 - Up to date!

Debian Bookworm Python 3.12.8

Additional context

While trying to identify the origin of the issue, I was able to install the combination dbt-core==1.8.8, dbt-databricks==1.9.4 which gives the error above, while the combination dbt-core==1.8.8, dbt-databricks==1.8.7 doesn't, which seems to confirm that it is not a change to dbt-core, but a change to dbt-databricks that is the root cause.

I have not been able to pinpoint a specific change since there has been a significant refactor between these two versions. I suspect that the issue might be there: https://github.com/databricks/dbt-databricks/blob/40c23374210f814334bafe59ec03e5bf18c5d86b/dbt/adapters/databricks/connections.py#L186 This seems to have been introduced by #849.

It looks like DatabricksConnectionManager calls DatabricksApiClient.create in its initializer, which in turns establishes a connection to Databricks, maybe that's a bit early in the process and should only happen later, when open is called?

According to the documentation of ConnectionManager:

open() is a classmethod that gets a connection object (which could be in any state, but will have a Credentials object with the attributes you defined above) and moves it to the 'open' state.`

I would therefore not expect the connection to be opened before this function is called.

ghjklw avatar Feb 13 '25 14:02 ghjklw

Thanks for reporting. Will discuss with dbt-core.

benc-db avatar Feb 13 '25 21:02 benc-db

Thank you 😊

ghjklw avatar Feb 19 '25 17:02 ghjklw

@benc-db I see you have a fix for this. When do you plan to release a new version with this fix?

ajsquared avatar Feb 24 '25 17:02 ajsquared

@benc-db I'm having the exact same problem but with dbt-databricks 1.10.12:

`Core:

  • installed: 1.10.11
  • latest: 1.10.11 - Up to date!

Plugins:

  • databricks: 1.10.12 - Up to date!`

It keeps trying to open a connection to databricks when running dbt-parse.

mmonteiro18 avatar Sep 23 '25 16:09 mmonteiro18

@mmonteiro18 can you share a stack trace so that I can find where it's happening? I haven't been able to repro.

benc-db avatar Sep 23 '25 19:09 benc-db