pandera
pandera copied to clipboard
Subclassing from `pandera.api.dataframe.model.DataFrameModel` errors on annotated but not initialized fields starting with an underscore
Describe the bug
If I create a generic subclass of pandera.api.dataframe.model.DataFrameModel that has an uninitialized, annotated field starting with an underscore, and try to instantiate it with concrete type parameters, DataFrameModel.__class_getitem__ throws an error when it tries to collect the fields here.
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [x] (optional) I have confirmed this bug exists on the main branch of pandera.
Code Sample, a copy-pastable example
from typing import Generic
from pandera.api.dataframe.model import DataFrameModel, TDataFrame, TSchema
class Schema(DataFrameModel[TDataFrame, TSchema], Generic[TDataFrame, TSchema]):
_foo: int
from pandera.api.pandas.container import DataFrameSchema
import pandas as pd
x: Schema[pd.DataFrame, DataFrameSchema]
The last line raises the following error:
KeyError Traceback (most recent call last)
Cell In[4], line 10
7 from pandera.api.pandas.container import DataFrameSchema
8 import pandas as pd
---> 10 x: Schema[pd.DataFrame, DataFrameSchema]
File ~/micromamba/envs/virgo/lib/python3.12/site-packages/pandera/api/dataframe/model.py:189, in DataFrameModel.__class_getitem__(cls, item)
187 param_dict: Dict[TypeVar, Type[Any]] = dict(zip(__parameters__, item))
188 extra: Dict[str, Any] = {"__annotations__": {}}
--> 189 for field, (annot_info, field_info) in cls._collect_fields().items():
190 if isinstance(annot_info.arg, TypeVar):
191 if annot_info.arg in param_dict:
File ~/micromamba/envs/virgo/lib/python3.12/site-packages/pandera/api/dataframe/model.py:359, in DataFrameModel._collect_fields(cls)
357 fields = {}
358 for field_name, annotation in annotations.items():
--> 359 field = attrs[field_name] # __init_subclass__ guarantees existence
360 if not isinstance(field, FieldInfo):
361 raise SchemaInitError(
362 f"'{field_name}' can only be assigned a 'Field', "
363 + f"not a '{type(field)}.'"
364 )
KeyError: '_foo'
Expected behavior
The field shouldn't be collected since it starts with an underscore.
Desktop (please complete the following information):
- OS: macOS
- Browser: Safari
- Version: 0.20.3
Additional context
It seems like _foo is never in the dict attrs = cls._get_model_attrs(), since that checks through the __dict__s of the superclasses, but _foo isn't initialized, so it's not there.
Maybe we should filter out non-fields from the annotations dict as well.