pandera icon indicating copy to clipboard operation
pandera copied to clipboard

AttributeError: module 'pandas' has no attribute 'ArrowDtype' with release 0.20.1

Open cjthorley opened this issue 1 year ago • 7 comments
trafficstars

Since release 0.20.1 I am unable to validate using my schemas that worked on 0.19.3

My schemas now unable to validate due the following error - "AttributeError: module 'pandas' has no attribute 'ArrowDtype'"

Has anyone else seen this?

cjthorley avatar Jul 01 '24 10:07 cjthorley

Can you share a minimally reproducible code snippet?

cosmicBboy avatar Jul 03 '24 20:07 cosmicBboy

also what version of pandas and pyarrow are you using?

cosmicBboy avatar Jul 04 '24 19:07 cosmicBboy

I had the same with pandas 1.4.4, I think the 0.20.* is no longer compatible with pandas < 2 (pyarrow 15.0.2)

SimonDR-Boltzmann avatar Jul 11 '24 15:07 SimonDR-Boltzmann

Please provide minimally reproducible code, with version of pandas, pyarrow, and pandera to get help

cosmicBboy avatar Jul 13 '24 20:07 cosmicBboy

Hitting same issue. Repro:

$ cat << EOF > example.py
import pandas as pd
import pandera as pa

# data to validate
df = pd.DataFrame({
    "column1": [1, 4, 0, 10, 9],
    "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
    "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"],
})

# define schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, checks=pa.Check.le(10)),
    "column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
    "column3": pa.Column(str, checks=[
        pa.Check.str_startswith("value_"),
        # define custom checks as functions that take a series as input and
        # outputs a boolean or boolean Series
        pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

validated_df = schema(df)
print(validated_df)
EOF

$ python3 -m venv venv && venv/bin/pip install "pandas<1.5" "numpy<2.0" pandera pyarrow && venv/bin/python example.py
...
<snip>
...
    raise AttributeError(f"module 'pandas' has no attribute '{name}'")
AttributeError: module 'pandas' has no attribute 'ArrowDtype'

Full dependency list:

$ venv/bin/pip freeze
annotated-types==0.7.0
importlib_metadata==8.4.0
multimethod==1.10
mypy-extensions==1.0.0
numpy==1.26.4
packaging==24.1
pandas==1.4.4
pandera==0.20.3
pyarrow==17.0.0
pydantic==2.8.2
pydantic_core==2.20.1
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
typeguard==4.3.0
typing-inspect==0.9.0
typing_extensions==4.12.2
wrapt==1.16.0
zipp==3.20.1

Using python3.9

Uninstalling the pyarrow package fixes it, but that obviously doesn't help if you need pyarrow :)

A solution would be to have https://github.com/unionai-oss/pandera/blob/main/pandera/engines/pandas_engine.py#L53 check the version as well I guess.

FreekPaans avatar Aug 28 '24 07:08 FreekPaans

Can you provide the full stacktrace @FreekPaans? want to see what the offending line is

cosmicBboy avatar Aug 28 '24 13:08 cosmicBboy

Sure!

Traceback (most recent call last):
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/pandas_engine.py", line 220, in dtype
    return engine.Engine.dtype(cls, data_type)
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/engine.py", line 271, in dtype
    raise TypeError(
TypeError: Data type 'None' not understood by Engine.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/freekpaans/code/pandera-example/example.py", line 23, in <module>
    validated_df = schema(df)
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/api/dataframe/container.py", line 340, in __call__
    return self.validate(
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/api/pandas/container.py", line 126, in validate
    return self._validate(
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/api/pandas/container.py", line 155, in _validate
    return self.get_backend(check_obj).validate(
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/backends/pandas/container.py", line 100, in validate
    components = self.collect_schema_components(
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/backends/pandas/container.py", line 322, in collect_schema_components
    pandas_engine.Engine.dtype(schema.dtype).type, BaseModel
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/pandas_engine.py", line 235, in dtype
    elif is_pyarrow_dtype(data_type):
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandera/engines/pandas_engine.py", line 113, in is_pyarrow_dtype
    return isinstance(pd_dtype, pd.ArrowDtype)
  File "/Users/freekpaans/code/pandera-example/venv/lib/python3.9/site-packages/pandas/__init__.py", line 261, in __getattr__
    raise AttributeError(f"module 'pandas' has no attribute '{name}'")
AttributeError: module 'pandas' has no attribute 'ArrowDtype'

FreekPaans avatar Aug 28 '24 13:08 FreekPaans

will cut a bugfix release by eow

cosmicBboy avatar Aug 29 '24 00:08 cosmicBboy

Nice, thanks!

FreekPaans avatar Aug 29 '24 09:08 FreekPaans