pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Type checking reports error when using a subclass of the specified DataFrameModel

Open thundercat1 opened this issue 1 year ago • 5 comments

mypy reports type errors when trying to use a subclass of a DataFrameModel in a scenario where the parent class would be accepted. Ordinarily I'd expect that a subclass would satisfy a type hint asking for the parent class.

In the code sample below, type checking fails, but the script executes successfully (pydantic validation succeeds). Is there a better way I should be declaring this kind of type hint, or is this a bug that can/should be fixed?

  • [x] I have checked that this issue has not already been reported.
  • [x] I have confirmed this bug exists on the latest version of pandera.
  • [ ] (optional) I have confirmed this bug exists on the master branch of pandera.

Environment:

pandera 0.14.5 mypy 1.1.1 Python 3.10.9

import pandera as pa
from pandera.typing import Series, DataFrame
import pandas as pd


class Animal(pa.DataFrameModel):
    name: Series[str]


class Dog(Animal):
    breed: Series[str]


@pa.check_types(with_pydantic=True)
def say_hello(animal: DataFrame[Animal]):
    # Says hello to any animal
    print(f"Hello {animal.name}!")


dog = DataFrame[Dog](pd.DataFrame({"name": ["Fido"], "breed": ["Labrador"]}))

# Since `Dog` is a subclass of `Animal`, this should be valid
say_hello(dog)

# However, mypy reports an error:
"""
mypy pandera_type_test.py --show-traceback

pandera_type_test.py:22: error: Argument 1 to "say_hello" has incompatible type "DataFrame[Dog]"; expected "DataFrame[Animal]"  [arg-type]
Found 1 error in 1 file (checked 1 source file)
"""

thundercat1 avatar Apr 27 '23 19:04 thundercat1

@cosmicBboy I just wanted to follow-up on this. This case seems different from the overwriting case outlined in the documented false positives case for mypy. Any thoughts or insights on this? This seems potentially more related to how DataFrameModels inherit from one another rather than a pandas-stubs or mypy issue.

kr-hansen avatar May 10 '23 19:05 kr-hansen

To further confirm this seems to be something specific to how pandera.DataFrameModels handle inheritance, I took the example from @thundercat1 and ported it to use pydantic models which doesn't show this same type of inheritance issue. Example below:

from pydantic import validate_arguments, BaseModel


class Animal(BaseModel):
    name: str


class Dog(Animal):
    breed: str


@validate_arguments(config=dict(arbitrary_types_allowed=True))
def say_hello(animal: Animal):
    # Says hello to any animal
    print(f"Hello {animal.name}!")


dog = Dog(name="Fido", breed="Labrador")

# Since `Dog` is a subclass of `Animal`, this should be valid
say_hello(dog)

# No mypy error in the case of `pydantic` models
"""
mypy pydantic_type_test.py --show-traceback

Success: no issues found in 1 source file
"""

Since this does seem to be pandera specific, do you have any thoughts of where to look to potentially poke around as to why pandera model inheritance may not be behaving as expected here? Perhaps something to do with the usage of the DataFrame generic type?

kr-hansen avatar May 10 '23 19:05 kr-hansen

All right @cosmicBboy I think I figured it out. It has to do with the DataFrame generic type and covariance & contravariance and the fact that mypy assumes generic types are invariant by default.

The following example adopted from @thundercat1's works fine for me:

from typing import Generic, TypeVar

import pandera as pa
from pandera.typing import Series, DataFrame
import pandas as pd

TDataFrame_co = TypeVar("TDataFrame_co", covariant=True)

class DataFrame_co(DataFrame, Generic[TDataFrame_co]):
    pass

class Animal(pa.DataFrameModel):
    name: Series[str]


class Dog(Animal):
    breed: Series[str]


@pa.check_types(with_pydantic=True)
def say_hello(animal: DataFrame[Animal]):
    # Says hello to any animal
    print(f"Hello {animal.name}!")


dog = DataFrame_co[Dog](pd.DataFrame({"name": ["Fido"], "breed": ["Labrador"]}))

# Since `Dog` is a subclass of `Animal`, this should be valid
say_hello(dog)

# No mypy error in the case of `pydantic` models
"""
mypy pandera_type_test_covariant.py --show-traceback

Success: no issues found in 1 source file
"""

I'm guessing the pydantic case works because of this definition in the code and how that probably gets perpetuated.

I briefly poked around in the code to see if I could tweak the DataFrame in the source code and hit a couple issues. Happy to try and implement them in a PR if you have any further suggestions @cosmicBboy:

  1. I added covariant=True to this TypeVar and I hit errors on the .from_records static method because that same TypeVar gets used in that method and I get a misc mypy error Cannot use a covariant type variable as a parameter. I'm not sure how to best resolve that error with how that TypeVar is used within that method without messing things up too much.
  2. I'm also getting 25 mypy [union-attrs] errors on trunk in the pandera/backends/pandas directory so didn't dig into resolving those either.

I'm also not sure if to pickup similar behavior for Series if you'd want something similar in the GenericDType or not. Also not sure if this is something you'd want integrated in the code base or not. It looks like the above workaround may work ok for our use case without needing to modify the base code, but thought I'd pass this back up and see your thoughts.

EDIT: This workaround actually doesn't fully work in our case, so we're still kinda stuck on this.

kr-hansen avatar May 12 '23 01:05 kr-hansen

I am facing a similar issue - I'd like the DataFrame to be covariant on its generic type argument. Is there any plan to move in this direction?

bartwozniak avatar Jan 10 '24 16:01 bartwozniak

Just ran into this issue w/ pandera v0.20.3 & Python 3.12.5.

trey-stafford avatar Aug 14 '24 15:08 trey-stafford