pandera
pandera copied to clipboard
fix: SQLModel table model not validated
When using a SQLModel class with table=True as dtype, schema is not validated. This is because such SQLModel classes do not validate data at init time (see here). This PR solves this by explicitly calling model_validate/parse_obj instead of instantiating the class.
Minimal example (python=3.8 sqlmodel=0.0.19 pandera=0.19.3):
# example.py
import pandas as pd
import pandera as pa
from pandera.engines.pandas_engine import PydanticModel
from sqlmodel import SQLModel, Field
class Record(SQLModel, table=True):
name: str = Field(primary_key=True)
class PydanticSchema(pa.DataFrameModel):
class Config:
dtype = PydanticModel(Record)
df = pd.DataFrame({"name": [3]})
PydanticSchema.validate(df)
Without the fix, validation succeeds while only emitting a warning from model_dump():
$ python example.py
UserWarning: Pydantic serializer warnings:
Expected `str` but got `int` - serialized value may not be as expected
With the fix, a SchemaError is raised:
$ python example.py
...
pandera.errors.SchemaError: Error while coercing 'PydanticSchema' to type <class '__main__.Record'>: Could not coerce <class 'pandas.core.frame.DataFrame'> data_container into type <class '__main__.Record'>
index failure_case
0 0 {'name': 1}
thanks @AlpAribal do you mind rebasing these changes onto the main branch? failing tests should go away aftaer that
Codecov Report
Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
Project coverage is 93.55%. Comparing base (
812b2a8) to head (59e7f30). Report is 151 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| pandera/engines/pandas_engine.py | 66.66% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1696 +/- ##
==========================================
- Coverage 94.28% 93.55% -0.73%
==========================================
Files 91 117 +26
Lines 7013 8843 +1830
==========================================
+ Hits 6612 8273 +1661
- Misses 401 570 +169
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
linter is making some valid complaints:
************* Module pandera.engines.pandas_engine
pandera/engines/pandas_engine.py:1320:26: E1101: Instance of 'Field' has no 'model_validate' member (no-member)
pandera/engines/pandas_engine.py:13[22](https://github.com/unionai-oss/pandera/actions/runs/9628170517/job/26571671455?pr=1696#step:10:23):26: E1101: Instance of 'Field' has no 'parse_obj' member (no-member)
You might have to do something like:
try:
_type = typing.cast(Type[BaseModel], self.type)
# pylint: disable=not-callable
if PYDANTIC_V2:
row = self.type.model_validate(row).model_dump()
else:
row = self.type.parse_obj(row).dict()
You might have to do something like:
try: _type = typing.cast(Type[BaseModel], self.type) # pylint: disable=not-callable if PYDANTIC_V2: row = self.type.model_validate(row).model_dump() else: row = self.type.parse_obj(row).dict()
Unfortunately, casting did not work, pylint still complains. This SO answer hints that pylint discards the type hint and uses the actual value (Field in this case). I was able to pass pylint by moving the declaration of self.type into __init__, but I am not sure if there can be any unwanted side effects of this. Another alternative is to ignore this particular pylint warning.
@AlpAribal I'm okay with ignoring the pylint warning