pandas-stubs
pandas-stubs copied to clipboard
Return type of pipe not consistent with types at run time
Describe the bug
Using pipe returns a instance of the type pipe is called on, but the type stubs imply it's the type of the function being applied by pipe.
i.e. if my function returns a DataFrame, but I call pipe on a class which inherits from DataFrame, then at run time I get back the subclass, but the typing implies it's just a vanilla DataFrame.
To Reproduce
Subclass DataFrame per the docs.
Create a pipe function using this signature:
def func(df: DataFrame) -> DataFrame:
Observe that at run time if I use pipe on the subtype, your get back an instance of the subtype, which is nice:
class SubDB(DataFrame):
... # required other stuff
sub = SubDF()
sub_f = sub.pipe(func)
type(sub_f) # is SubDF()
But if you hover sub_f in VSCode it's type is DataFrame.
Please complete the following information:
- OS: Linux
- OS Version 20.04.6
- python version 3.11.7
- version of type checker pyright version 1.1.356, commit 6652c4a8)
- version of installed
pandas-stubs2.2.1.240316
Additional context
The offending type is the T here, it it should return Self:
https://github.com/pandas-dev/pandas-stubs/blob/072997b077ad4f766272e9bb2d03fc3771829270/pandas-stubs/core/generic.pyi#L369-L374
Thanks for the report. PR with tests welcome.
The type annotations of pipe are correct -- it returns the same type returned by the input function. In your case, the function is declared to return a DataFrame so this is what you get from pipe. If you want the function func to work with DataFrame AND its subclasses, you have to do something like this:
from typing import TypeVar
from pandas import DataFrame
DataFrameT = TypeVar("DataFrameT", bound=DataFrame)
def func(df: DataFrameT) -> DataFrameT: return df
class SubDF(DataFrame):
... # required other stuff
sub = SubDF()
sub_f = sub.pipe(func)
reveal_type(sub_f) # Type of "sub_f" is "SubDF" (Pylance)
The type annotations of pipe are correct -- it returns the same type returned by the input function. In your case, the function is declared to return a
DataFrameso this is what you get from pipe. If you want the functionfuncto work withDataFrameAND its subclasses, you have to do something like this:
I'd still like to see if returning Self would also fix the problem.
I'd still like to see if returning
Selfwould also fix the problem.
I don’t know what this means, pipe already uses Self, that’s why the example I gave above works.
I'd still like to see if returning
Selfwould also fix the problem.I don’t know what this means, pipe already uses
Self, that’s why the example I gave above works.
See the suggestion above. Right now, def pipe() in pandas-stubs/core/generic.pyi is returning T. The suggestion is to change it to return Self .
See the suggestion above. Right now,
def pipe()inpandas-stubs/core/generic.pyiis returningT. The suggestion is to change it to returnSelf.
No,. that wouldn't work. pipe returns exactly what the function passed to it returns, not a copy of "self". If you run this example:
import pandas as pd
df = pd.DataFrame(data={"A": [1, 2], "B": [3, 4]})
def f(df: pd.DataFrame) -> int:
return df.size
def g(df: pd.DataFrame) -> pd.Series:
return df["A"]
res_f = df.pipe(f)
print(res_f, type(res_f))
res_g = df.pipe(g)
print(res_g, type(res_g))
You get:
4 <class 'int'>
0 1
1 2
Name: A, dtype: int64 <class 'pandas.core.series.Series'>
No,. that wouldn't work. pipe returns exactly what the function passed to it returns, not a copy of "self". If you run this example:
Thanks for your analysis. Your solution at https://github.com/pandas-dev/pandas-stubs/issues/908#issuecomment-2132279017 is how the OP should handle this.