pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Datatype coercion fails to occur if DataFrame Model Config `coerce=True` defined in inherited Config class

Open jonathanstathakis opened this issue 1 year ago • 0 comments

Describe the bug

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [x] (optional) I have confirmed this bug exists on the master branch of pandera.

For my project to scale up, I want to define a project level base Config class from which all other Model Config classes inherit the defaults from. Unfortunately this coercion fails to occur, even though on inspection of the class attributes, config=True. Oddly, it does work if we first call .to_schema() before calling .validate(), i.e. Model.to_schema().validate(df), but NOT if the Config superclass does not inherit from Pandera BaseConfig. Please see the example below.

Code Sample, a copy-pastable example

import pandera as pa
import pandas as pd

df = pd.DataFrame({'a':[1,2,3], 'b':['x','y','z']})

# this behaves as expected, converting to the pandas datatypes
class SchemaNoInherit(pa.DataFrameModel):
    a: pd.Float64Dtype
    b: pd.StringDtype
    
    class Config:
        coerce=True

print(SchemaNoInherit(df).dtypes)

# Schema with BaseConfig inheritance and overriding coercion setting. This also behaves as expected, coercion to pandas datatypes occurs.

from pandera.api.pandas.model_config import BaseConfig

class SchemaBaseInherit(pa.DataFrameModel):
    a: pd.Float64Dtype
    b: pd.StringDtype
    
    class Config(BaseConfig):
        coerce=True

print(SchemaBaseInherit(df).dtypes)

class ProjectBaseConfig(BaseConfig):
    coerce=True

# explicitly overriding coerce with inheritance does work as expected

class SchemaProjBaseInheritExplicit(pa.DataFrameModel):
    a: pd.Float64Dtype
    b: pd.StringDtype
    
    class Config(ProjectBaseConfig):
        coerce=True

# print(SchemaProjBaseInheritExplicit(df).dtypes)

# this does not work as expected, in fact it fails the validation

class SchemaProjBaseInherit(pa.DataFrameModel):
    a: pd.Float64Dtype
    b: pd.StringDtype
    
    class Config(ProjectBaseConfig):
        pass
        
print(SchemaProjBaseInherit(df).dtypes)

# what about going to schema first?

print(SchemaBaseInherit.to_schema().validate(df).dtypes)

# this does work.

# Finally, what if the project base config class doesnt inherit from `BaseConfig`? This also does not work.

class ProjectBaseConfigNoInherit:
    coerce=True

class SchemaProjBaseConfigNoInherit(pa.DataFrameModel):
    a: pd.Float64Dtype
    b: pd.StringDtype
    
    class Config(ProjectBaseConfigNoInherit):
        pass
    
print(SchemaProjBaseConfigNoInherit.validate(df).dtypes)

# Interestingly, this fails

print(SchemaProjBaseConfigNoInherit.to_schema().validate(df).dtypes)

Expected behavior

I expect that a DataFrame Model with a Config class inheriting from another class will produce the expected behavior of coerce=True.

Desktop (please complete the following information):

MacOS Ventura 13.2.1

jonathanstathakis avatar Jan 26 '24 13:01 jonathanstathakis