pandas-stubs groupby with MultiIndex not annotated correctly

Consider a DF or Series with a MultiIndex M1, M2, M3.

When grouping as in grp = df.groupby(level=["M1", "M3"])

You will get an error like

Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
  Type "list[str]" cannot be assigned to type "Level | None"
    "list[str]" is incompatible with protocol "Hashable"

This is due to the annotation of groupby as

        by: IntervalIndex[IntervalT],
        axis: AxisIndex = ...,
        level: Level | None = ...,

The type "Level" is an alias for "Hashable" and allows only one level. I think it should be Hashable|Sequence[Hashable] or, as pandas calls it, IndexLabel (+ the | None)

I've identified a few places (see attachment). However, I am not sure the patch is correct/complete. It is interesting to see that stack is annotated with level: Level | list[Level] = ... while unstack has level: Level = ... while IMHO both should be IndexLabel.

Also, I am not sure the groupby annotations are completey correct because I think you cannot use df.groupby(by=not_None, level=other_not_None) at the same time, yet the annotations seems to allow it.

Minimum example code:

import pandas as pd

df = pd.DataFrame(
    data={"a": [0]},
    index=pd.MultiIndex.from_product(
        [pd.Index(["i1a"], name="i1"), pd.Index(["i2a"], name="i2")]
    ),
)
grp = df.groupby(level=["i1", "i2"])

pyright outputs:

pyright pd-stub-test.py
pd-stub-test.py
  pd-stub-test.py:9:24 - error: Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
    Type "list[str]" cannot be assigned to type "Level | None"
      "list[str]" is incompatible with protocol "Hashable"
        "__hash__" is an incompatible type
          Type "None" cannot be assigned to type "(self: list[str]) -> int"
      "list[str]" is incompatible with "int"
      Type cannot be assigned to type "None" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations

0001-MultiIndex-in-groupby.patch.gz

(Warning: I haven't tested/verified the patch!)

May 22 '23 13:05 behrenhoff

Can you indicate which version of pandas-stubs you are using?

May 22 '23 14:05 Dr-Irv

I am using pandas-stubs==1.5.3.230321 but you can find the issue in the main branch (pulled appox 1h ago) as well. The attached patch is against the current main. See for example https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/core/frame.pyi line 1077.

But as said, I am not sure the patch is correct - I am not exactly sure if/when label/by combinations are allowed in df.groupby. But you can definitely have multiple levels in level=.

May 22 '23 14:05 behrenhoff

OK, thanks. Can you create a PR (with tests) with your change?

I see now that the docs say that you can't use by and level, so I think the proper change here is to remove level from all the existing overloads, and then create a new overload that has level but not by (and probably a * to indicate that keyword arguments are required if you use level)

May 22 '23 14:05 Dr-Irv

Sorry, I misread. Yes, I can. Probably tomorrow.

May 22 '23 14:05 behrenhoff

This happens with a Series as well:

grp = series.groupby(level=["l1", "l2"])
...
No overload variant of "groupby" of "Series" matches argument type "list[str]"  [call-overload]

due to this https://github.com/pandas-dev/pandas-stubs/blob/6bb12157a810cc26e034e012bff6a0ec2da191d0/pandas-stubs/core/series.pyi#L609

Sep 06 '23 19:09 janrito

This seems to have been fixed (or I can't reproduce it anymore on the latest version). Now the typing is IndexLabel | None = ... so this could probably be closed @Dr-Irv. Let me know if I missed something.

Sep 23 '24 23:09 loicdiridollou

Yes, seems like #837 closed this.

Sep 24 '24 21:09 Dr-Irv

pandas-stubs pandas-stubs copied to clipboard

groupby with MultiIndex not annotated correctly

pandas-stubs
pandas-stubs copied to clipboard