pandera
pandera copied to clipboard
Datatype coercion fails to occur if DataFrame Model Config `coerce=True` defined in inherited Config class
Describe the bug
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [x] (optional) I have confirmed this bug exists on the master branch of pandera.
For my project to scale up, I want to define a project level base Config class from which all other Model Config classes inherit the defaults from. Unfortunately this coercion fails to occur, even though on inspection of the class attributes, config=True
. Oddly, it does work if we first call .to_schema()
before calling .validate()
, i.e. Model.to_schema().validate(df)
, but NOT if the Config superclass does not inherit from Pandera BaseConfig
. Please see the example below.
Code Sample, a copy-pastable example
import pandera as pa
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':['x','y','z']})
# this behaves as expected, converting to the pandas datatypes
class SchemaNoInherit(pa.DataFrameModel):
a: pd.Float64Dtype
b: pd.StringDtype
class Config:
coerce=True
print(SchemaNoInherit(df).dtypes)
# Schema with BaseConfig inheritance and overriding coercion setting. This also behaves as expected, coercion to pandas datatypes occurs.
from pandera.api.pandas.model_config import BaseConfig
class SchemaBaseInherit(pa.DataFrameModel):
a: pd.Float64Dtype
b: pd.StringDtype
class Config(BaseConfig):
coerce=True
print(SchemaBaseInherit(df).dtypes)
class ProjectBaseConfig(BaseConfig):
coerce=True
# explicitly overriding coerce with inheritance does work as expected
class SchemaProjBaseInheritExplicit(pa.DataFrameModel):
a: pd.Float64Dtype
b: pd.StringDtype
class Config(ProjectBaseConfig):
coerce=True
# print(SchemaProjBaseInheritExplicit(df).dtypes)
# this does not work as expected, in fact it fails the validation
class SchemaProjBaseInherit(pa.DataFrameModel):
a: pd.Float64Dtype
b: pd.StringDtype
class Config(ProjectBaseConfig):
pass
print(SchemaProjBaseInherit(df).dtypes)
# what about going to schema first?
print(SchemaBaseInherit.to_schema().validate(df).dtypes)
# this does work.
# Finally, what if the project base config class doesnt inherit from `BaseConfig`? This also does not work.
class ProjectBaseConfigNoInherit:
coerce=True
class SchemaProjBaseConfigNoInherit(pa.DataFrameModel):
a: pd.Float64Dtype
b: pd.StringDtype
class Config(ProjectBaseConfigNoInherit):
pass
print(SchemaProjBaseConfigNoInherit.validate(df).dtypes)
# Interestingly, this fails
print(SchemaProjBaseConfigNoInherit.to_schema().validate(df).dtypes)
Expected behavior
I expect that a DataFrame Model with a Config class inheriting from another class will produce the expected behavior of coerce=True
.
Desktop (please complete the following information):
MacOS Ventura 13.2.1