dbt-databricks
dbt-databricks copied to clipboard
cannot import name 'HeaderFactory' from 'databricks.sdk.core'
Describe the bug
Since the release of the new 0.29.0 version of databricks-sdk, the dbt job running on our databricks cluster, using dbt-databricks==1.6.5 version, started to fail with the error provided below. After downgrading to version 0.28.0, by explicitly specifying it on the databricks cluster, the error is no longer observed.
Steps To Reproduce
Install dbt-databricks==1.6.5 on a databricks cluster and run your dbt models.
Expected behavior
The expected behavior would be to successfully import HeaderFactory.
Screenshots and log output
cannot import name 'HeaderFactory' from 'databricks.sdk.core' (/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/core.py)
07:09:00 Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/cli/requires.py", line 87, in wrapper
result, success = func(*args, **kwargs)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/cli/requires.py", line 72, in wrapper
return func(*args, **kwargs)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/cli/requires.py", line 140, in wrapper
profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/runtime.py", line 70, in load_profile
profile = Profile.render(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 436, in render
return cls.from_raw_profiles(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 401, in from_raw_profiles
return cls.from_raw_profile_info(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 355, in from_raw_profile_info
credentials: Credentials = cls._credentials_from_profile(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/config/profile.py", line 165, in _credentials_from_profile
cls = load_plugin(typename)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/factory.py", line 212, in load_plugin
return FACTORY.load_plugin(name)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/dbt/adapters/factory.py", line 58, in load_plugin
mod: Any = import_module("." + name, "dbt.adapters")
File "/usr/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "
System information
dbt-databricks==1.6.5 dbt-core==1.6.7 dbt-spark==1.6.0 python==3.9.5
Additional context
Seems like there should be a limit to what version of the databricks-sdk is installed alongside the dbt-databricks adapter for version 1.6.5 instead of pulling the most recent one.
I'm having the same issue for version 1.7.5.
Its happening at my end as well. I mitigated it using prev version.
This issue is happening in my side. when execute dbt source freshness command.
library dependency issue. Downgrade the databricks-sdk version(from 0.29.0 to 0.28.0) solve the issue. . May be this lib issue better to handle inside dbt-databricks.
databricks-sdk==0.28.0
Seems databricks-sdk released yesterday https://pypi.org/project/databricks-sdk/
This bug only exists in old versions, as newer versions pin the SDK to a particular known-good version. What are the reasons you are pinned to old versions? I cannot backport a fix to a particular patch version, so it's more useful for me to find out why you're not upgrading than to file a bug that only exists in outdated patch versions.
Thank you for your response! I see, it doesn't occur in the latest version.
This is my personal opinion. Hope some aspect of this will help.
We have fixed the version of the dbt-databricks because of the processing stops working when the version is upgraded.
Do you think it will be better to fix the version of both dbt-databricks and databricks-sdk as well by user side?
If not handle library dependency between dbt-databricks and databricks-sdk , It my be possible another library dependency issue could be happen in the future. So probably better to handle library dependency by user side.
The reason I personally cannot easily upgrade the version of dbt-databricks is because there are some change need not only version but also dbt config setting, and it won't work if we simply upgrade the version.
Changes: Need to change existing dbt related files. https://docs.getdbt.com/docs/dbt-versions/core-upgrade
Additionally, unlike the other python library, dbt-databricks version up may change the out put data results. I think using some tools like https://github.com/dbt-labs/dbt-audit-helper might help in this regard, but it would also be necessary to have mechanisms like automation in place to use them.
So It is not so easy to update the dbt version rather than other python library. Hope some aspect of this info will help.
Thank you!
@case-k-git when you say the processing stops working, I need to know specifically how so that I can fix it. We will never be able to go back to old patch version number and change its code, so if you can't upgrade, you will never get fixes. The upgrade to 1.8 should not be a breaking change; if it is, I should know what broke so that I can restore it.
What are the reasons you are pinned to old versions?
In my case because I'm working in a heavily regulated environment where stability is key. Upgrading package versions requires paperwork for the change. I'll plan an upgrade in one of our future releases. I've pinned the databricks-sdk version to mitigate the issue - that was the path of least resistance :)
@benc-db I think something like the below would be all that is needed to fix this, it works on a local installation of mine at least.
The real breaking change was the sdk renaming HeaderFactory to CredentialsProvider and credentials_provider to credentials_strategy. Everything else works fine once you fix the imports.
try:
from databricks.sdk.core import credentials_provider
from databricks.sdk.core import CredentialsProvider
from databricks.sdk.core import HeaderFactory
except ImportError:
from databricks.sdk.core import credentials_strategy as credentials_provider
from databricks.sdk.core import CredentialsProvider
from databricks.sdk.core import CredentialsProvider as HeaderFactory
Here's the are the relevant python packages/versions I have installed, where this small change is working/allowing everything to run smoothly.
databricks-connect==14.3.2
databricks-sdk==0.29.0
databricks-sql-connector==3.1.2
dbt-adapters==1.3.2
dbt-common==1.6.0
dbt-core==1.8.4
dbt-databricks==1.8.3
dbt-extractor==0.5.1
dbt-semantic-interfaces==0.5.1
dbt-spark==1.8.0
@benc-db
Thank you for your reply. Yea sure I confirm that existing dbt operation is failed when update the latest one which old version work but not checked what I need to change from old dbt config for latest one. So I will repot after checked.
Hi @benc-db
Sorry for the late response. I think there were some breaking change for dbt tests. https://discourse.getdbt.com/t/upgrade-to-dbt-1-8-and-tests/14067
But latest version is keep supporting so there are no breaking change any more. https://github.com/dbt-labs/dbt-core/issues/10564
Regarding library dependency issue, Our team is using pipenv to lock the library.
pipenv run dbt XXXX
Thank you.