pandas-stubs icon indicating copy to clipboard operation
pandas-stubs copied to clipboard

type annotation for Index/MultiIndex.names is incorrect

Open tswast opened this issue 1 year ago • 6 comments

Describe the bug

type annotation for Index/MultiIndex.names is incorrect

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"a": ["a", "b", "c"], "i": [10, 11, 12]}, index=pd.Index([5, 10, 15], name="idx"))

In [4]: df.index.names
Out[4]: FrozenList(['idx'])

In [5]: df.index.names = (None,)

In [6]: df.index.names
Out[6]: FrozenList([None])
  1. Indicate which type checker you are using (mypy or pyright).

mypy

  1. Show the error message received from that type checker while checking your example.

bigframes/core/blocks.py:436: error: Incompatible types in assignment (expression has type "tuple[None]", variable has type "list[str]") [assignment]

Please complete the following information:

  • OS: [e.g. Windows, Linux, MacOS] Linux
  • OS Version [e.g. 22]

Distributor ID: Debian Description: Debian GNU/Linux rodete Release: n/a Codename: rodete

  • python version Python 3.10.9
  • version of type checker mypy==1.5.1
  • version of installed pandas-stubs pandas-stubs==2.0.3.230814

Additional context Add any other context about the problem here.

tswast avatar Oct 26 '23 21:10 tswast

There are a few things going on here, and I don't think we can do much about them.

First, from a static typing perspective, we can't track the type of DataFrame.index. It could be a regular Index, or a MultiIndex. So if you know your DataFrame is backed by a MultiIndex, you'd have to cast df.index to the MultiIndex type.

Second, whether the underlying index is single-dimensional or a MultiIndex, df.index.names will return a list of strings. It is also possible to return a list of None if you clobber the names, but if we use static typing to declare that Index.names returns a list[str | None], that will force more people to cast the result of Index.names.

Finally, you reported a mypy error on your own code, but we'd prefer an example that is self-contained and can be run directly through the type checker. ipython code can't be used that way.

I'm going to close this, but am willing to reopen it if you can convince me otherwise.

Dr-Irv avatar Oct 26 '23 22:10 Dr-Irv

names is available on both Index and MultiIndex. I think that's a moot point in regards to this issue.

Isn't list[str] incorrect though? Unnamed indexes are incredibly common. Here's a standalone example:

(dev-3.10-pip) ➜  pandas-stubs-804 cat sample.py
import pandas as pd

df = pd.DataFrame({"a": ["a", "b", "c"], "i": [10, 11, 12]})
print(df.index.names)

# OK
df.index.names = ["idx"]
print(df.index.names)

# Not OK, but works
df.index.names = ("idx2",)
print(df.index.names)
df.index.names = [None]
print(df.index.names)
df.index.names = (None,)
print(df.index.names)
(dev-3.10-pip) ➜  pandas-stubs-804 mypy sample.py
sample.py:11: error: Incompatible types in assignment (expression has type "tuple[str]", variable has type "list[str]")  [assignment]
sample.py:13: error: List item 0 has incompatible type "None"; expected "str"  [list-item]
sample.py:15: error: Incompatible types in assignment (expression has type "tuple[None]", variable has type "list[str]")  [assignment]
Found 3 errors in 1 file (checked 1 source file)

tswast avatar Oct 26 '23 23:10 tswast

@Dr-Irv , I've added a standalone sample demonstrating the issue.

tswast avatar Oct 26 '23 23:10 tswast

Output of sample.py showing the default name is None:

(dev-3.10-pip) ➜  pandas-stubs-804 python sample.py
[None]
['idx']
['idx2']
[None]
[None]

tswast avatar Oct 26 '23 23:10 tswast

Thanks @tswast for the example. I will reopen.

The property for names should be updated here: https://github.com/pandas-dev/pandas-stubs/blob/9aac8e31ba69eb4c0583e55dd2198755fb031620/pandas-stubs/core/indexes/base.pyi#L291

in two ways:

  1. The "getter" should return list[str | None]
  2. The "setter" should allow any SequenceNotStr[str] (but not Sequence[str])

PR with tests welcome.

Dr-Irv avatar Oct 27 '23 12:10 Dr-Irv

using str | None sounds reasonable. Technically, any type seems to be accepted at runtime:

>>> df.index.names = [None]
>>> df.index.names
FrozenList([None])

>>> df.index.names = [1]
>>> df.index.names
FrozenList([1])

The pandas-internal annotations declare name as any hashable object: https://github.com/pandas-dev/pandas/blob/e86ed377639948c64c429059127bcf5b359ab6be/pandas/core/indexes/base.py#L1657C5-L1657C32

I couldn't find names, but the doc-string here says that the elements have to be hashable https://github.com/pandas-dev/pandas/blob/e86ed377639948c64c429059127bcf5b359ab6be/pandas/core/indexes/base.py#L1753

twoertwein avatar Oct 27 '23 17:10 twoertwein