pandera
pandera copied to clipboard
AttributeError: module 'pandas' has no attribute 'ArrowDtype' with release 0.20.1
Since release 0.20.1 I am unable to validate using my schemas that worked on 0.19.3
My schemas now unable to validate due the following error - "AttributeError: module 'pandas' has no attribute 'ArrowDtype'"
Has anyone else seen this?
Can you share a minimally reproducible code snippet?
also what version of pandas and pyarrow are you using?
I had the same with pandas 1.4.4, I think the 0.20.* is no longer compatible with pandas < 2 (pyarrow 15.0.2)
Please provide minimally reproducible code, with version of pandas, pyarrow, and pandera to get help
Hitting same issue. Repro:
$ cat << EOF > example.py
import pandas as pd
import pandera as pa
# data to validate
df = pd.DataFrame({
"column1": [1, 4, 0, 10, 9],
"column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
"column3": ["value_1", "value_2", "value_3", "value_2", "value_1"],
})
# define schema
schema = pa.DataFrameSchema({
"column1": pa.Column(int, checks=pa.Check.le(10)),
"column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
"column3": pa.Column(str, checks=[
pa.Check.str_startswith("value_"),
# define custom checks as functions that take a series as input and
# outputs a boolean or boolean Series
pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
]),
})
validated_df = schema(df)
print(validated_df)
EOF
$ python3 -m venv venv && venv/bin/pip install "pandas<1.5" "numpy<2.0" pandera pyarrow && venv/bin/python example.py
...
<snip>
...
raise AttributeError(f"module 'pandas' has no attribute '{name}'")
AttributeError: module 'pandas' has no attribute 'ArrowDtype'
Full dependency list:
$ venv/bin/pip freeze
annotated-types==0.7.0
importlib_metadata==8.4.0
multimethod==1.10
mypy-extensions==1.0.0
numpy==1.26.4
packaging==24.1
pandas==1.4.4
pandera==0.20.3
pyarrow==17.0.0
pydantic==2.8.2
pydantic_core==2.20.1
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
typeguard==4.3.0
typing-inspect==0.9.0
typing_extensions==4.12.2
wrapt==1.16.0
zipp==3.20.1
Using python3.9
Uninstalling the pyarrow package fixes it, but that obviously doesn't help if you need pyarrow :)
A solution would be to have https://github.com/unionai-oss/pandera/blob/main/pandera/engines/pandas_engine.py#L53 check the version as well I guess.
Can you provide the full stacktrace @FreekPaans? want to see what the offending line is
Sure!
Traceback (most recent call last):
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/pandas_engine.py", line 220, in dtype
return engine.Engine.dtype(cls, data_type)
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/engine.py", line 271, in dtype
raise TypeError(
TypeError: Data type 'None' not understood by Engine.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/freekpaans/code/pandera-example/example.py", line 23, in <module>
validated_df = schema(df)
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/api/dataframe/container.py", line 340, in __call__
return self.validate(
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/api/pandas/container.py", line 126, in validate
return self._validate(
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/api/pandas/container.py", line 155, in _validate
return self.get_backend(check_obj).validate(
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/backends/pandas/container.py", line 100, in validate
components = self.collect_schema_components(
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/backends/pandas/container.py", line 322, in collect_schema_components
pandas_engine.Engine.dtype(schema.dtype).type, BaseModel
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/pandas_engine.py", line 235, in dtype
elif is_pyarrow_dtype(data_type):
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/pandas_engine.py", line 113, in is_pyarrow_dtype
return isinstance(pd_dtype, pd.ArrowDtype)
File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandas/__init__.py", line 261, in __getattr__
raise AttributeError(f"module 'pandas' has no attribute '{name}'")
AttributeError: module 'pandas' has no attribute 'ArrowDtype'
will cut a bugfix release by eow
Nice, thanks!