pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Subclassing from `pandera.api.dataframe.model.DataFrameModel` errors on annotated but not initialized fields starting with an underscore

Open adzcai opened this issue 1 year ago • 0 comments

Describe the bug

If I create a generic subclass of pandera.api.dataframe.model.DataFrameModel that has an uninitialized, annotated field starting with an underscore, and try to instantiate it with concrete type parameters, DataFrameModel.__class_getitem__ throws an error when it tries to collect the fields here.

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [x] (optional) I have confirmed this bug exists on the main branch of pandera.

Code Sample, a copy-pastable example

from typing import Generic
from pandera.api.dataframe.model import DataFrameModel, TDataFrame, TSchema

class Schema(DataFrameModel[TDataFrame, TSchema], Generic[TDataFrame, TSchema]):
    _foo: int

from pandera.api.pandas.container import DataFrameSchema
import pandas as pd

x: Schema[pd.DataFrame, DataFrameSchema]

The last line raises the following error:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 10
      7 from pandera.api.pandas.container import DataFrameSchema
      8 import pandas as pd
---> 10 x: Schema[pd.DataFrame, DataFrameSchema]

File ~/micromamba/envs/virgo/lib/python3.12/site-packages/pandera/api/dataframe/model.py:189, in DataFrameModel.__class_getitem__(cls, item)
    187 param_dict: Dict[TypeVar, Type[Any]] = dict(zip(__parameters__, item))
    188 extra: Dict[str, Any] = {"__annotations__": {}}
--> 189 for field, (annot_info, field_info) in cls._collect_fields().items():
    190     if isinstance(annot_info.arg, TypeVar):
    191         if annot_info.arg in param_dict:

File ~/micromamba/envs/virgo/lib/python3.12/site-packages/pandera/api/dataframe/model.py:359, in DataFrameModel._collect_fields(cls)
    357 fields = {}
    358 for field_name, annotation in annotations.items():
--> 359     field = attrs[field_name]  # __init_subclass__ guarantees existence
    360     if not isinstance(field, FieldInfo):
    361         raise SchemaInitError(
    362             f"'{field_name}' can only be assigned a 'Field', "
    363             + f"not a '{type(field)}.'"
    364         )

KeyError: '_foo'

Expected behavior

The field shouldn't be collected since it starts with an underscore.

Desktop (please complete the following information):

  • OS: macOS
  • Browser: Safari
  • Version: 0.20.3

Additional context

It seems like _foo is never in the dict attrs = cls._get_model_attrs(), since that checks through the __dict__s of the superclasses, but _foo isn't initialized, so it's not there. Maybe we should filter out non-fields from the annotations dict as well.

adzcai avatar Jul 26 '24 23:07 adzcai