pandera icon indicating copy to clipboard operation
pandera copied to clipboard

polars.exceptions.ColumnNotFoundError when coerce=True and Optional field is missing

Open antonioalegria opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug A clear and concise description of what the bug is.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [ ] (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from pandera.polars import Field # type: ignore
from pandera.polars import DataFrameModel # type: ignore

from typing import Optional

import polars as pl


class MyModel(DataFrameModel):
    a: Optional[str] = Field(description="some description", nullable=True)
    b: Optional[str] = Field(description="some description") # BOOM
    c: Optional[str] = Field(description="some description", str_contains=".", nullable=True)
    d: Optional[str] = Field(description="some description", str_contains=".") # BOOM

df = pl.DataFrame({})
schema = MyModel.to_schema()
schema.strict = True
schema.coerce = True # -> without this it works
print(schema.validate(df)) # BOOM

Exception: (.venv) antonioalegria@shiro dojo % /Users/antonioalegria/Developer/dojo/.venv/bin/python /Users/antonioalegria/Developer/dojo/dojo/test.py Traceback (most recent call last): File ".../test.py", line 22, in print(schema.validate(df)) ^^^^^^^^^^^^^^^^^^^ File ".../.venv/lib/python3.12/site-packages/pandera/api/polars/container.py", line 58, in validate output = self.get_backend(check_obj).validate( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../.venv/lib/python3.12/site-packages/pandera/backends/polars/container.py", line 63, in validate check_obj = parser(check_obj, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File ".../.venv/lib/python3.12/site-packages/pandera/backends/polars/container.py", line 396, in coerce_dtype check_obj = self._coerce_dtype_helper(check_obj, schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../.venv/lib/python3.12/site-packages/pandera/backends/polars/container.py", line 455, in _coerce_dtype_helper obj = getattr(col_schema.dtype, coerce_fn)( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".../.venv/lib/python3.12/site-packages/pandera/engines/polars_engine.py", line 181, in try_coerce lf.collect() File ".../.venv/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2034, in collect return wrap_df(ldf.collect(callback)) ^^^^^^^^^^^^^^^^^^^^^ polars.exceptions.ColumnNotFoundError: a

Expected behavior

The dataframe should've been validated.

Desktop (please complete the following information):

OS: macOS 14.6.1 Python 3.12.4 polars-lts-cpu 1.6.0 pandera 0.20.3

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

antonioalegria avatar Sep 07 '24 11:09 antonioalegria

I have the same problem. If the schema does not coerce the types, the Optional type works as expected (the DataFrame does not require that column). Otherwise, the schema requires that column.

IsaiasGutierrezCruz avatar Sep 27 '24 22:09 IsaiasGutierrezCruz

Same issue here, enabling coerce in any column basically makes it required...

c0dearm avatar Dec 21 '24 17:12 c0dearm

This is a duplicate of https://github.com/unionai-oss/pandera/issues/1660, should be addressed by https://github.com/unionai-oss/pandera/pull/1871. Gonna cut a new release in the next few days.

cosmicBboy avatar Dec 22 '24 16:12 cosmicBboy