pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Bugfix/1446: Ensure Pydantic Models Can Be Created with`typing.pyspark.DataFrame` or `typing.pyspark_sql.DataFrame` Generic

Open brayan07 opened this issue 1 year ago • 4 comments

In this PR we resolve the issue reported in #1446, where any Pydantic model with a pandera.typing.pyspark.DataFrame or pandera.typing.pyspark_sql.DataFrame always throws a confusing ValidationError.

For clarity, we want to make sure the following leads to the expected behavior:

import pyspark.sql.types as T

from pandera.pyspark import DataFrameModel, Field
from pandera.typing.pyspark_sql import DataFrame
from pydantic import BaseModel
from pyspark.sql import SparkSession


class SampleSchema(DataFrameModel):
    """
    Sample schema model with data checks.
    """

    product: T.StringType() = Field()
    price: T.IntegerType() = Field()


class PydanticContainer(BaseModel):
    """
    Pydantic container with a DataFrameModel as a field.
    """

    data: DataFrame[SampleSchema]

    class Config:
        arbitrary_types_allowed = True

We do this by creating a _PydanticIntegrationMixIn that can be used by both pandera.typing.pyspark_sql.DataFrame and pandera.typing.pyspark.DataFrame.

The content of the mixin is a variation of the methods used in pandera.typing.pandas.DataFrame.

Note: We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.

brayan07 avatar Dec 15 '23 15:12 brayan07

Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing.

cosmicBboy avatar Dec 18 '23 16:12 cosmicBboy

Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks!

brayan07 avatar Dec 19 '23 15:12 brayan07

I'm getting the same failed tests locally for the main branch, as well as for this branch, with make nox-conda. I don't think it's what I added but something in the dev setup. Would it be alright if we ran the CI workflow one more time to help me debug?

brayan07 avatar Dec 19 '23 15:12 brayan07

Hi @brayan07 sorry for the delayed review on this!

I believe the test errors are coming from from pydantic import GetCoreSchemaHandler. Will need to move that import into the PYDANTIC_V2 conditional

cosmicBboy avatar Apr 13 '24 15:04 cosmicBboy