pandas-stubs icon indicating copy to clipboard operation
pandas-stubs copied to clipboard

Type checking with `converters=` on `read_excel`

Open oliverbeagley-pgg opened this issue 1 year ago • 4 comments

Describe the bug Pyright doesn't seem to like something about using converters=... with read_excel

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
from functools import partial

import pandas as pd

df_1 = pd.read_excel(
    "foo.csv",
    converters={"field_1": partial(pd.to_datetime, errors="coerce")},
)
df_2 = pd.read_excel(
    "foo.csv",
    sheet_name="sheet",
    converters={"field_1": partial(pd.to_datetime, errors="coerce")},
)

reveal_type(df_1)
reveal_type(df_2)
  1. Indicate which type checker you are using (mypy or pyright).: Both
  2. Show the error message received from that type checker while checking your example.

mypy:

$ mypy t.py
t.py:15: note: Revealed type is "pandas.core.frame.DataFrame"
t.py:16: note: Revealed type is "pandas.core.frame.DataFrame"
Success: no issues found in 1 source file

pyright:

/home/user/t.py
  /home/user/t.py:7:16 - error: Argument of type "dict[str, partial[Timestamp]]" cannot be assigned to parameter "converters" of type "dict[int | str, (object) -> object] | None" in function "read_excel"
    Type "dict[str, partial[Timestamp]]" cannot be assigned to type "dict[int | str, (object) -> object] | None"
      "dict[str, partial[Timestamp]]" is incompatible with "dict[int | str, (object) -> object]"
        Type parameter "_KT@dict" is invariant, but "str" is not the same as "int | str"
        Type parameter "_VT@dict" is invariant, but "partial[Timestamp]" is not the same as "(object) -> object"
      "dict[str, partial[Timestamp]]" is incompatible with "None" (reportGeneralTypeIssues)
  /home/user/t.py:9:8 - error: No overloads for "read_excel" match the provided arguments (reportGeneralTypeIssues)
  /home/user/t.py:12:16 - error: Argument of type "dict[str, partial[Timestamp]]" cannot be assigned to parameter "converters" of type "dict[int | str, (object) -> object] | None" in function "read_excel"
    Type "dict[str, partial[Timestamp]]" cannot be assigned to type "dict[int | str, (object) -> object] | None"
      "dict[str, partial[Timestamp]]" is incompatible with "dict[int | str, (object) -> object]"
        Type parameter "_KT@dict" is invariant, but "str" is not the same as "int | str"
        Type parameter "_VT@dict" is invariant, but "partial[Timestamp]" is not the same as "(object) -> object"
      "dict[str, partial[Timestamp]]" is incompatible with "None" (reportGeneralTypeIssues)
  /home/user/t.py:15:13 - information: Type of "df_1" is "DataFrame"
  /home/user/t.py:16:13 - information: Type of "df_2" is "Unknown"

Please complete the following information:

  • WSL - ubuntu 22.04
  • python version 3.10.12
  • version of type checker: pyright==1.1.345, mypy==1.8.0
  • version of installed pandas-stubs: pandas_stubs==1.4.4.220919 (also present on latest pandas_stubs version for most recent pandas version)

Additional context Not sure if its something with the converters type or read_excel, but using a lambda instead of partial results in both type checkers not liking it even though read_csv has no issue with either ie

converters={"field": lambda s: pd.to_datetime(s, errors="coerce")}
#vs
converters={"field": partial(pd.to_datetime, errors="coerce")}

oliverbeagley-pgg avatar Jan 11 '24 15:01 oliverbeagley-pgg

Thanks for the report. I think this is a pyright issue, because the following code reports a similar error:

from functools import partial
from typing import Callable

basetwo = partial(int, base=2)


def foo(func: Callable[[object], object], value: str):
    print(func(value))


foo(basetwo, "1011")

Created an issue for pyright: https://github.com/microsoft/pyright/issues/6961

Will leave this open pending feedback there.

Dr-Irv avatar Jan 11 '24 15:01 Dr-Irv

Based on the discussion with pyright, it looks like the proper fix is to change Callable[[object], object] to Callable[[Any], Any] in pandas-stubs/io/excel/_base.pyi for the converters argument.

PR with tests welcome.

Dr-Irv avatar Jan 11 '24 17:01 Dr-Irv

I can look to make that change, would it also be worth looking at the other io methods that use converters and see if they need to be changed also?

oliverbeagley-pgg avatar Jan 12 '24 09:01 oliverbeagley-pgg

I can look to make that change, would it also be worth looking at the other io methods that use converters and see if they need to be changed also?

Sure, but you'll have to see if the API is consistent across the other API's. I did see that we have ConvertersArg defined in _typing.pyi, so it might be better to change that and use it throughout, but only if the API is consistent (i.e., the args passed to converters are similar across the different io methods)

Also, now that I think about it, for the Excel case, the right argument type might be Callable[[Any], Scalar] because the function should return something that would normally be stored inside of a DataFrame, i.e., a Scalar . Not sure if that will work with partial, and, if not, then Callable[[Any], Any] is probably the way to go. Feel free to open a PR either way.

Dr-Irv avatar Jan 12 '24 15:01 Dr-Irv