pandera icon indicating copy to clipboard operation
pandera copied to clipboard

fix: SQLModel table model not validated

Open AlpAribal opened this issue 1 year ago • 4 comments
trafficstars

When using a SQLModel class with table=True as dtype, schema is not validated. This is because such SQLModel classes do not validate data at init time (see here). This PR solves this by explicitly calling model_validate/parse_obj instead of instantiating the class.

Minimal example (python=3.8 sqlmodel=0.0.19 pandera=0.19.3):

# example.py
import pandas as pd
import pandera as pa
from pandera.engines.pandas_engine import PydanticModel
from sqlmodel import SQLModel, Field

class Record(SQLModel, table=True):
    name: str = Field(primary_key=True)

class PydanticSchema(pa.DataFrameModel):
    class Config:
        dtype = PydanticModel(Record)

df = pd.DataFrame({"name": [3]})
PydanticSchema.validate(df)

Without the fix, validation succeeds while only emitting a warning from model_dump():

$ python example.py 
UserWarning: Pydantic serializer warnings:
  Expected `str` but got `int` - serialized value may not be as expected

With the fix, a SchemaError is raised:

$ python example.py 
...
pandera.errors.SchemaError: Error while coercing 'PydanticSchema' to type <class '__main__.Record'>: Could not coerce <class 'pandas.core.frame.DataFrame'> data_container into type <class '__main__.Record'>
   index failure_case
0      0  {'name': 1}

AlpAribal avatar Jun 19 '24 22:06 AlpAribal

thanks @AlpAribal do you mind rebasing these changes onto the main branch? failing tests should go away aftaer that

cosmicBboy avatar Jun 22 '24 14:06 cosmicBboy

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.55%. Comparing base (812b2a8) to head (59e7f30). Report is 151 commits behind head on main.

Files with missing lines Patch % Lines
pandera/engines/pandas_engine.py 66.66% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1696      +/-   ##
==========================================
- Coverage   94.28%   93.55%   -0.73%     
==========================================
  Files          91      117      +26     
  Lines        7013     8843    +1830     
==========================================
+ Hits         6612     8273    +1661     
- Misses        401      570     +169     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 22 '24 14:06 codecov[bot]

linter is making some valid complaints:

************* Module pandera.engines.pandas_engine
pandera/engines/pandas_engine.py:1320:26: E1101: Instance of 'Field' has no 'model_validate' member (no-member)
pandera/engines/pandas_engine.py:13[22](https://github.com/unionai-oss/pandera/actions/runs/9628170517/job/26571671455?pr=1696#step:10:23):26: E1101: Instance of 'Field' has no 'parse_obj' member (no-member)

You might have to do something like:

            try:
                _type = typing.cast(Type[BaseModel], self.type)
                # pylint: disable=not-callable
                if PYDANTIC_V2:
                    row = self.type.model_validate(row).model_dump()
                else:
                    row = self.type.parse_obj(row).dict()

cosmicBboy avatar Jun 23 '24 21:06 cosmicBboy

You might have to do something like:

            try:
                _type = typing.cast(Type[BaseModel], self.type)
                # pylint: disable=not-callable
                if PYDANTIC_V2:
                    row = self.type.model_validate(row).model_dump()
                else:
                    row = self.type.parse_obj(row).dict()

Unfortunately, casting did not work, pylint still complains. This SO answer hints that pylint discards the type hint and uses the actual value (Field in this case). I was able to pass pylint by moving the declaration of self.type into __init__, but I am not sure if there can be any unwanted side effects of this. Another alternative is to ignore this particular pylint warning.

AlpAribal avatar Jun 24 '24 20:06 AlpAribal

@AlpAribal I'm okay with ignoring the pylint warning

cosmicBboy avatar Jul 04 '24 19:07 cosmicBboy