pandas-stubs icon indicating copy to clipboard operation
pandas-stubs copied to clipboard

Return type of pipe not consistent with types at run time

Open davetapley opened this issue 1 year ago • 1 comments

Describe the bug

Using pipe returns a instance of the type pipe is called on, but the type stubs imply it's the type of the function being applied by pipe.

i.e. if my function returns a DataFrame, but I call pipe on a class which inherits from DataFrame, then at run time I get back the subclass, but the typing implies it's just a vanilla DataFrame.

To Reproduce

Subclass DataFrame per the docs.

Create a pipe function using this signature:

def func(df: DataFrame) -> DataFrame:

Observe that at run time if I use pipe on the subtype, your get back an instance of the subtype, which is nice:

class SubDB(DataFrame):
   ... # required other stuff

sub = SubDF()
sub_f = sub.pipe(func)
type(sub_f) # is SubDF()

But if you hover sub_f in VSCode it's type is DataFrame.

Please complete the following information:

  • OS: Linux
  • OS Version 20.04.6
  • python version 3.11.7
  • version of type checker pyright version 1.1.356, commit 6652c4a8)
  • version of installed pandas-stubs 2.2.1.240316

Additional context

The offending type is the T here, it it should return Self:

https://github.com/pandas-dev/pandas-stubs/blob/072997b077ad4f766272e9bb2d03fc3771829270/pandas-stubs/core/generic.pyi#L369-L374

davetapley avatar Apr 18 '24 19:04 davetapley

Thanks for the report. PR with tests welcome.

Dr-Irv avatar Apr 18 '24 20:04 Dr-Irv

The type annotations of pipe are correct -- it returns the same type returned by the input function. In your case, the function is declared to return a DataFrame so this is what you get from pipe. If you want the function func to work with DataFrame AND its subclasses, you have to do something like this:

from typing import TypeVar
from pandas import DataFrame

DataFrameT = TypeVar("DataFrameT", bound=DataFrame)

def func(df: DataFrameT) -> DataFrameT: return df

class SubDF(DataFrame):
   ... # required other stuff

sub = SubDF()
sub_f = sub.pipe(func)
reveal_type(sub_f) # Type of "sub_f" is "SubDF" (Pylance)

hamdanal avatar May 26 '24 16:05 hamdanal

The type annotations of pipe are correct -- it returns the same type returned by the input function. In your case, the function is declared to return a DataFrame so this is what you get from pipe. If you want the function func to work with DataFrame AND its subclasses, you have to do something like this:

I'd still like to see if returning Self would also fix the problem.

Dr-Irv avatar May 28 '24 21:05 Dr-Irv

I'd still like to see if returning Self would also fix the problem.

I don’t know what this means, pipe already uses Self, that’s why the example I gave above works.

hamdanal avatar May 28 '24 21:05 hamdanal

I'd still like to see if returning Self would also fix the problem.

I don’t know what this means, pipe already uses Self, that’s why the example I gave above works.

See the suggestion above. Right now, def pipe() in pandas-stubs/core/generic.pyi is returning T. The suggestion is to change it to return Self .

Dr-Irv avatar May 28 '24 21:05 Dr-Irv

See the suggestion above. Right now, def pipe() in pandas-stubs/core/generic.pyi is returning T. The suggestion is to change it to return Self .

No,. that wouldn't work. pipe returns exactly what the function passed to it returns, not a copy of "self". If you run this example:

import pandas as pd
df = pd.DataFrame(data={"A": [1, 2], "B": [3, 4]})

def f(df: pd.DataFrame) -> int:
    return df.size

def g(df: pd.DataFrame) -> pd.Series:
    return df["A"]

res_f = df.pipe(f)
print(res_f, type(res_f))

res_g = df.pipe(g)
print(res_g, type(res_g))

You get:

4 <class 'int'>
0    1
1    2
Name: A, dtype: int64 <class 'pandas.core.series.Series'>

hamdanal avatar May 28 '24 21:05 hamdanal

No,. that wouldn't work. pipe returns exactly what the function passed to it returns, not a copy of "self". If you run this example:

Thanks for your analysis. Your solution at https://github.com/pandas-dev/pandas-stubs/issues/908#issuecomment-2132279017 is how the OP should handle this.

Dr-Irv avatar May 28 '24 22:05 Dr-Irv