pandas-stubs
pandas-stubs copied to clipboard
TypeIs annotation for `pands.isna` is not quite accurate
Describe the bug
Recently https://github.com/pandas-dev/pandas-stubs/pull/972 introduced a TypeIs annotation for pd.isna. The annotation states that the function returns True if and only if the input value is one of the types NaTType | NAType | None. However, this is not correct, since the function also returns True for the value float('nan'), which is of type float.
To Reproduce
- Provide a minimal runnable
pandasexample that is not properly checked by the stubs.
from pandas import isna
def main() -> None:
foo: float = float('nan')
if isna(foo) and 1 == 1:
print('not a number')
else:
print('is a number')
if __name__ == '__main__':
main()
The output is not a number, but because of the TypeIs annotation, mypy thinks that line is unreachable
- Indicate which type checker you are using (
mypyorpyright).
mypy 1.10.1
- Show the error message received from that type checker while checking your example.
type_test.py:6: error: Right operand of "and" is never evaluated [unreachable] if isna(foo) and 1 == 1: ^~~~~~ type_test.py:7: error: Statement is unreachable [unreachable] print('not a number') ^~~~~~~~~~~~~~~~~~~~~ Found 2 errors in 1 file (checked 1 source file)
Please complete the following information:
- OS: [e.g. Windows, Linux, MacOS]: Windows
- OS Version [e.g. 22]: 11
- python version: 3.11.4
- version of type checker: 1.10.1
- version of installed
pandas-stubs: 2.2.2.240909
Additional context Add any other context about the problem here.
Python's type system does not provide a way to discriminate between a regular float and a NaN float. I would argue that the current implementation does the most intuitive thing in most cases at the expense of a relatively rare use case. Anecdotally, I have found None, np.nan, and pd.NA to be more common.
We could update to include the Literal[float('NaN')] in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regular float.
Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.
Thanks for the report. I'm not sure what we can do here, because we can't specify float("nan") or np.nan as a Literal.
One workaround for users would be math.isnan(x) if isinstance(x, float) else pd.isna(x), but obviously there's some performance implications there. It's probably better to suppress the unreachable warning than take the hit.
We could update to include the
Literal[float('NaN')]in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regularfloat.
I thought that might work, but the typing system doesn't support it. Found this discussion: https://github.com/python/typing/issues/1160
Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.
Well, it might work if we could do Literal[float("nan")], since that would "divide" the types, although I'm not sure if the type checkers are handling that.
Also, there are allowed to be float NaN values distinct from float('nan'), so it could still go wrong in even rarer cases.
Is there a way to import pandas.api.typing.NAType (or pandas.libs._missing.NAType) that the stubs approve of? The reason I found this quirk of the type hint is actually because of a bug in my real code. I have a variable taken from an unpacked row of a DataFrame, and I typed it as float, when it could also be pd.NA.
So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.
Is there a way to import
pandas.api.typing.NAType(orpandas.libs._missing.NAType) that the stubs approve of?
Right now, if you do from pandas._libs.missing import NAType , that should work.
We're behind on updating the stubs for all 2.x support, which means that pandas.api.typing doesn't exist yet in the stubs.
Great, thanks for your help. I'll probably use None until NAType is public in the stubs, since either way I just need to tell mypy that pd.isna might return True.
So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.
I ran into the same issue today, and resolved it by typing my variable like value: float | None