pandera
pandera copied to clipboard
fix pandas pyarrow string validation
trafficstars
Fixes a bug where pyarrow string would give a schema validation error.
Snippet:
import pandas as pd
import pandera as pa
import pyarrow
df = pd.DataFrame([{"foo": "bar"}], dtype=pd.ArrowDtype(pyarrow.string()))
df.info()
Schema = pa.DataFrameSchema({"foo": pa.Column(pyarrow.string)})
Schema.validate(df).info()
Before:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 foo 1 non-null string[pyarrow]
dtypes: string[pyarrow](1)
memory usage: 139.0 bytes
Traceback (most recent call last):
File "/home/jovyan/work/pandera/scraps.py", line 61, in <module>
Schema.validate(df).info()
^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/api/pandas/container.py", line 125, in validate
return self._validate(
^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/api/pandas/container.py", line 154, in _validate
return self.get_backend(check_obj).validate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/backends/pandas/container.py", line 104, in validate
error_handler = self.run_checks_and_handle_errors(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/backends/pandas/container.py", line 179, in run_checks_and_handle_errors
error_handler.collect_error(
File "/home/jovyan/work/pandera/pandera/api/base/error_handler.py", line 54, in collect_error
raise schema_error from original_exc
File "/home/jovyan/work/pandera/pandera/backends/pandas/container.py", line 200, in run_schema_component_checks
result = schema_component.validate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/api/dataframe/components.py", line 163, in validate
return self.get_backend(check_obj).validate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/backends/pandas/components.py", line 132, in validate
validate_column(check_obj, column_name)
File "/home/jovyan/work/pandera/pandera/backends/pandas/components.py", line 92, in validate_column
error_handler.collect_error(
File "/home/jovyan/work/pandera/pandera/api/base/error_handler.py", line 54, in collect_error
raise schema_error from original_exc
File "/home/jovyan/work/pandera/pandera/backends/pandas/components.py", line 72, in validate_column
validated_check_obj = super(ColumnBackend, self).validate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/backends/pandas/array.py", line 81, in validate
error_handler = self.run_checks_and_handle_errors(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovyan/work/pandera/pandera/backends/pandas/array.py", line 145, in run_checks_and_handle_errors
error_handler.collect_error(
File "/home/jovyan/work/pandera/pandera/api/base/error_handler.py", line 54, in collect_error
raise schema_error from original_exc
pandera.errors.SchemaError: expected series 'foo' to have type string[pyarrow], got string[pyarrow]
After:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 foo 1 non-null string[pyarrow]
dtypes: string[pyarrow](1)
memory usage: 139.0 bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 foo 1 non-null string[pyarrow]
dtypes: string[pyarrow](1)
memory usage: 139.0 bytes
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.27%. Comparing base (
4df61da) to head (954b6c5). Report is 91 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1636 +/- ##
===========================================
- Coverage 94.29% 83.27% -11.02%
===========================================
Files 91 116 +25
Lines 7024 8646 +1622
===========================================
+ Hits 6623 7200 +577
- Misses 401 1446 +1045
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@cosmicBboy could it be the uv cache is bugged? I remember seeing something similar a few weeks back. We could try cleaning the cache with uv cache clean.