pandas-stubs
pandas-stubs copied to clipboard
groupby with MultiIndex not annotated correctly
Consider a DF or Series with a MultiIndex M1, M2, M3.
When grouping as in
grp = df.groupby(level=["M1", "M3"])
You will get an error like
Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
Type "list[str]" cannot be assigned to type "Level | None"
"list[str]" is incompatible with protocol "Hashable"
This is due to the annotation of groupby as
by: IntervalIndex[IntervalT],
axis: AxisIndex = ...,
level: Level | None = ...,
The type "Level" is an alias for "Hashable" and allows only one level. I think it should be Hashable|Sequence[Hashable] or, as pandas calls it, IndexLabel (+ the | None)
I've identified a few places (see attachment). However, I am not sure the patch is correct/complete. It is interesting to see that stack is annotated with level: Level | list[Level] = ... while unstack has level: Level = ... while IMHO both should be IndexLabel.
Also, I am not sure the groupby annotations are completey correct because I think you cannot use df.groupby(by=not_None, level=other_not_None) at the same time, yet the annotations seems to allow it.
Minimum example code:
import pandas as pd
df = pd.DataFrame(
data={"a": [0]},
index=pd.MultiIndex.from_product(
[pd.Index(["i1a"], name="i1"), pd.Index(["i2a"], name="i2")]
),
)
grp = df.groupby(level=["i1", "i2"])
pyright outputs:
pyright pd-stub-test.py
pd-stub-test.py
pd-stub-test.py:9:24 - error: Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
Type "list[str]" cannot be assigned to type "Level | None"
"list[str]" is incompatible with protocol "Hashable"
"__hash__" is an incompatible type
Type "None" cannot be assigned to type "(self: list[str]) -> int"
"list[str]" is incompatible with "int"
Type cannot be assigned to type "None" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations
0001-MultiIndex-in-groupby.patch.gz
(Warning: I haven't tested/verified the patch!)
Can you indicate which version of pandas-stubs you are using?
I am using pandas-stubs==1.5.3.230321 but you can find the issue in the main branch (pulled appox 1h ago) as well. The attached patch is against the current main.
See for example https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/core/frame.pyi line 1077.
But as said, I am not sure the patch is correct - I am not exactly sure if/when label/by combinations are allowed in df.groupby. But you can definitely have multiple levels in level=.
OK, thanks. Can you create a PR (with tests) with your change?
I see now that the docs say that you can't use by and level, so I think the proper change here is to remove level from all the existing overloads, and then create a new overload that has level but not by (and probably a * to indicate that keyword arguments are required if you use level)
Sorry, I misread. Yes, I can. Probably tomorrow.
This happens with a Series as well:
grp = series.groupby(level=["l1", "l2"])
...
No overload variant of "groupby" of "Series" matches argument type "list[str]" [call-overload]
due to this https://github.com/pandas-dev/pandas-stubs/blob/6bb12157a810cc26e034e012bff6a0ec2da191d0/pandas-stubs/core/series.pyi#L609
This seems to have been fixed (or I can't reproduce it anymore on the latest version).
Now the typing is IndexLabel | None = ... so this could probably be closed @Dr-Irv. Let me know if I missed something.
Yes, seems like #837 closed this.