dbt-databricks
dbt-databricks copied to clipboard
Add support for Pandas >2.0
Describe the bug
Since Numpy released its latest version 2.0, it is not compatible with an older version of Pandas. However, dbt-databricks in version 1.8.3 only supports pandas up to version 2.0.
Workaround: fix numpy version to 1.26.4 (latest before 2.0).
Steps To Reproduce
- For my devcontainer setup, I use
requirements.txtwith only few entries:
dbt-databricks==1.8.3
sqlfluff
sqlfluff-templater-dbt
- Install the above dependencies.
- Run
dbt deps - Try to run any dbt command like
dbt compile
Expected behavior
Successfull dbt run.
Screenshots and log output
The outcome of the commans:
DBT now has installed the packages.
But it fails in any other execution (in this case, it is dbt compile):
Quote from the logs:
13:07:21 Running with dbt=1.8.3
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "/usr/local/bin/dbt", line 8, in
sys.exit(cli()) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/main.py", line 148, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 138, in wrapper result, success = func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 101, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads) File "/usr/local/lib/python3.9/site-packages/dbt/config/runtime.py", line 71, in load_profile profile = Profile.render( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 403, in render return cls.from_raw_profiles( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 369, in from_raw_profiles return cls.from_raw_profile_info( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 325, in from_raw_profile_info credentials: Credentials = cls._credentials_from_profile( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 149, in _credentials_from_profile cls = load_plugin(typename) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 239, in load_plugin return FACTORY.load_plugin(name) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 68, in load_plugin mod: Any = import_module("." + name, "dbt.adapters") File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 3, in from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 26, in from databricks.sql.client import Connection as DatabricksSQLConnection File "/usr/local/lib/python3.9/site-packages/databricks/sql/client.py", line 3, in import pandas File "/usr/local/lib/python3.9/site-packages/pandas/init.py", line 23, in from pandas.compat import ( File "/usr/local/lib/python3.9/site-packages/pandas/compat/init.py", line 27, in from pandas.compat.pyarrow import ( File "/usr/local/lib/python3.9/site-packages/pandas/compat/pyarrow.py", line 8, in import pyarrow as pa File "/usr/local/lib/python3.9/site-packages/pyarrow/init.py", line 65, in import pyarrow.lib as _lib AttributeError: _ARRAY_API not found 13:07:21 Encountered an error: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject 13:07:21 Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 138, in wrapper result, success = func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 101, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper profile = load_profile(flags.PROJECT_DIR, flags.VARS, flags.PROFILE, flags.TARGET, threads) File "/usr/local/lib/python3.9/site-packages/dbt/config/runtime.py", line 71, in load_profile profile = Profile.render( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 403, in render return cls.from_raw_profiles( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 369, in from_raw_profiles return cls.from_raw_profile_info( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 325, in from_raw_profile_info credentials: Credentials = cls._credentials_from_profile( File "/usr/local/lib/python3.9/site-packages/dbt/config/profile.py", line 149, in _credentials_from_profile cls = load_plugin(typename) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 239, in load_plugin return FACTORY.load_plugin(name) File "/usr/local/lib/python3.9/site-packages/dbt/adapters/factory.py", line 68, in load_plugin mod: Any = import_module("." + name, "dbt.adapters") File "/usr/local/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File " ", line 1030, in _gcd_import File " ", line 1007, in _find_and_load File " ", line 986, in _find_and_load_unlocked File " ", line 680, in _load_unlocked File " ", line 850, in exec_module File " ", line 228, in _call_with_frames_removed File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/init.py", line 3, in from dbt.adapters.databricks.connections import DatabricksConnectionManager # noqa File "/usr/local/lib/python3.9/site-packages/dbt/adapters/databricks/connections.py", line 26, in from databricks.sql.client import Connection as DatabricksSQLConnection File "/usr/local/lib/python3.9/site-packages/databricks/sql/client.py", line 3, in import pandas File "/usr/local/lib/python3.9/site-packages/pandas/init.py", line 46, in from pandas.core.api import ( File "/usr/local/lib/python3.9/site-packages/pandas/core/api.py", line 1, in from pandas._libs import ( File "/usr/local/lib/python3.9/site-packages/pandas/_libs/init.py", line 18, in from pandas._libs.interval import Interval File "interval.pyx", line 1, in init pandas._libs.interval ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
System information
The output of dbt --version:
<output goes here>
The operating system you're using:
The output of python --version: