pandas-stubs
pandas-stubs copied to clipboard
Type checking with `converters=` on `read_excel`
Describe the bug
Pyright doesn't seem to like something about using converters=... with read_excel
To Reproduce
- Provide a minimal runnable
pandasexample that is not properly checked by the stubs.
from functools import partial
import pandas as pd
df_1 = pd.read_excel(
"foo.csv",
converters={"field_1": partial(pd.to_datetime, errors="coerce")},
)
df_2 = pd.read_excel(
"foo.csv",
sheet_name="sheet",
converters={"field_1": partial(pd.to_datetime, errors="coerce")},
)
reveal_type(df_1)
reveal_type(df_2)
- Indicate which type checker you are using (
mypyorpyright).: Both - Show the error message received from that type checker while checking your example.
mypy:
$ mypy t.py
t.py:15: note: Revealed type is "pandas.core.frame.DataFrame"
t.py:16: note: Revealed type is "pandas.core.frame.DataFrame"
Success: no issues found in 1 source file
pyright:
/home/user/t.py
/home/user/t.py:7:16 - error: Argument of type "dict[str, partial[Timestamp]]" cannot be assigned to parameter "converters" of type "dict[int | str, (object) -> object] | None" in function "read_excel"
Type "dict[str, partial[Timestamp]]" cannot be assigned to type "dict[int | str, (object) -> object] | None"
"dict[str, partial[Timestamp]]" is incompatible with "dict[int | str, (object) -> object]"
Type parameter "_KT@dict" is invariant, but "str" is not the same as "int | str"
Type parameter "_VT@dict" is invariant, but "partial[Timestamp]" is not the same as "(object) -> object"
"dict[str, partial[Timestamp]]" is incompatible with "None" (reportGeneralTypeIssues)
/home/user/t.py:9:8 - error: No overloads for "read_excel" match the provided arguments (reportGeneralTypeIssues)
/home/user/t.py:12:16 - error: Argument of type "dict[str, partial[Timestamp]]" cannot be assigned to parameter "converters" of type "dict[int | str, (object) -> object] | None" in function "read_excel"
Type "dict[str, partial[Timestamp]]" cannot be assigned to type "dict[int | str, (object) -> object] | None"
"dict[str, partial[Timestamp]]" is incompatible with "dict[int | str, (object) -> object]"
Type parameter "_KT@dict" is invariant, but "str" is not the same as "int | str"
Type parameter "_VT@dict" is invariant, but "partial[Timestamp]" is not the same as "(object) -> object"
"dict[str, partial[Timestamp]]" is incompatible with "None" (reportGeneralTypeIssues)
/home/user/t.py:15:13 - information: Type of "df_1" is "DataFrame"
/home/user/t.py:16:13 - information: Type of "df_2" is "Unknown"
Please complete the following information:
- WSL - ubuntu 22.04
- python version
3.10.12 - version of type checker:
pyright==1.1.345,mypy==1.8.0 - version of installed
pandas-stubs:pandas_stubs==1.4.4.220919(also present on latest pandas_stubs version for most recent pandas version)
Additional context
Not sure if its something with the converters type or read_excel, but using a lambda instead of partial results in both type checkers not liking it even though read_csv has no issue with either ie
converters={"field": lambda s: pd.to_datetime(s, errors="coerce")}
#vs
converters={"field": partial(pd.to_datetime, errors="coerce")}
Thanks for the report. I think this is a pyright issue, because the following code reports a similar error:
from functools import partial
from typing import Callable
basetwo = partial(int, base=2)
def foo(func: Callable[[object], object], value: str):
print(func(value))
foo(basetwo, "1011")
Created an issue for pyright: https://github.com/microsoft/pyright/issues/6961
Will leave this open pending feedback there.
Based on the discussion with pyright, it looks like the proper fix is to change Callable[[object], object] to Callable[[Any], Any] in pandas-stubs/io/excel/_base.pyi for the converters argument.
PR with tests welcome.
I can look to make that change, would it also be worth looking at the other io methods that use converters and see if they need to be changed also?
I can look to make that change, would it also be worth looking at the other io methods that use converters and see if they need to be changed also?
Sure, but you'll have to see if the API is consistent across the other API's. I did see that we have ConvertersArg defined in _typing.pyi, so it might be better to change that and use it throughout, but only if the API is consistent (i.e., the args passed to converters are similar across the different io methods)
Also, now that I think about it, for the Excel case, the right argument type might be Callable[[Any], Scalar] because the function should return something that would normally be stored inside of a DataFrame, i.e., a Scalar . Not sure if that will work with partial, and, if not, then Callable[[Any], Any] is probably the way to go. Feel free to open a PR either way.