pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Can you use Pydantic Field Aliasing with Pandera / PydanticModel schema definitions?

Open mcmasty opened this issue 2 years ago • 4 comments
trafficstars

How to use Pydantic Field Alias with pandera

I am processing a CSV and I am trying to use Pandera to validate the data. The names in the CSV header row are not what I want the names in my model to be. I haven't figured out how to achieve field aliasing. Any suggestions?

Here is a snippet that reproduces the error I am getting.

import io
import pydantic
import pandas as pd
import pandera as pa

from pandera.engines.pandas_engine import PydanticModel


class AliasedRecord(pydantic.BaseModel):
    name: str = pydantic.Field(alias="Name")
    amt_in_local: float = pydantic.Field(alias="Amount in local currency")

class AliasDFSchema(pa.DataFrameModel):
    """Pandera schema using the pydantic model."""

    class Config:
        """Config with dataframe-level data type."""

        dtype = PydanticModel(AliasedRecord)
        strict=True
        coerce = True  # this is required, otherwise a SchemaInitError is raised

# Direct Pydantic Model Validation
ar_m = AliasedRecord.model_validate({"Name":"Foo", "Amount in local currency": 1.32})
print(f"My Model is: {ar_m}")

# Now try validating a DataFrame
# Generate data similar to the source CSV
f = io.StringIO('Name,Amount in local currency\nfoo,1.32\nbar,3.34')
df1 = pd.read_csv(f)
validated_df = AliasDFSchema(df1)

Output

The successful Model:


My Model is: name='Foo' amt_in_local=1.32

The DataFrame / Pandera error ...

... bunch of stuff removed for brevity  

SchemaError: column 'Name' not in DataFrameSchema {}

df1 is correctly created

Screenshot 2023-10-18 at 18 33 30

mcmasty avatar Oct 18 '23 22:10 mcmasty

Looks like PydanticModel doesn't interact well with strict=True. This works:

class AliasDFSchema(pa.DataFrameModel):
    """Pandera schema using the pydantic model."""

    class Config:
        """Config with dataframe-level data type."""

        dtype = PydanticModel(AliasedRecord)
        coerce = True  # this is required, otherwise a SchemaInitError is raised

One potential fix for this would be to update the DataFrameSchema.__init__ method to special case the case where dtype = PydanticModel. Basically, just pull out the column names/aliases from the pydantic model and create a column dictionary.

Turning this into a bug issue in case anyone wants to open a PR!

cosmicBboy avatar Oct 19 '23 13:10 cosmicBboy

I would like to have a crack at this please

patelnets avatar Dec 11 '23 18:12 patelnets

One thing that would be nice to add to the pandera/pydantic integration is enabling outputing field aliases. For example, enabling something like PydanticModel(AliasedRecord, by_alias=True). Otherwise I don't think we're able to output a validated dataframe with aliased column names.

benlindsay avatar Apr 22 '24 19:04 benlindsay

As an example of what I'm talking about:

import pandas as pd
import pandera as pa
from pandera.engines.pandas_engine import PydanticModel
from pydantic import BaseModel, Field


class Schema(pa.DataFrameModel):
    col_2020: pa.typing.Series[int] = pa.Field(alias="Col 2020")


df = pd.DataFrame({"Col 2020": [99, 100]})

print(Schema.validate(df))
#    Col 2020
# 0        99
# 1       100


class SchemaRow(BaseModel):
    col_2020: int = Field(..., alias="Col 2020")


class PydanticSchema(pa.DataFrameModel):
    class Config:
        dtype = PydanticModel(SchemaRow)
        coerce = True

print(PydanticSchema.validate(df))
#    col_2020
# 0        99
# 1       100

If you make a PydanticModel+Pandera equivalent of a standard Pandera model with an alias, the validation behavior is different, in that the standard Pandera model will retain the column alias whereas the PydanticModel+Pandera version will revert from the field alias to the field name. I had to abandon using the convenient @pa.check_types decorator for some functions in an app I'm working on because of this.

benlindsay avatar Apr 30 '24 20:04 benlindsay