pandas-stubs icon indicating copy to clipboard operation
pandas-stubs copied to clipboard

TypeIs annotation for `pands.isna` is not quite accurate

Open steve-mavens opened this issue 1 year ago • 10 comments

Describe the bug

Recently https://github.com/pandas-dev/pandas-stubs/pull/972 introduced a TypeIs annotation for pd.isna. The annotation states that the function returns True if and only if the input value is one of the types NaTType | NAType | None. However, this is not correct, since the function also returns True for the value float('nan'), which is of type float.

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
from pandas import isna

def main() -> None:
   foo: float = float('nan')
   if isna(foo) and 1 == 1:
       print('not a number')
   else:
       print('is a number')

if __name__ == '__main__':
    main()

The output is not a number, but because of the TypeIs annotation, mypy thinks that line is unreachable

  1. Indicate which type checker you are using (mypy or pyright).

mypy 1.10.1

  1. Show the error message received from that type checker while checking your example.

type_test.py:6: error: Right operand of "and" is never evaluated [unreachable] if isna(foo) and 1 == 1: ^~~~~~ type_test.py:7: error: Statement is unreachable [unreachable] print('not a number') ^~~~~~~~~~~~~~~~~~~~~ Found 2 errors in 1 file (checked 1 source file)

Please complete the following information:

  • OS: [e.g. Windows, Linux, MacOS]: Windows
  • OS Version [e.g. 22]: 11
  • python version: 3.11.4
  • version of type checker: 1.10.1
  • version of installed pandas-stubs: 2.2.2.240909

Additional context Add any other context about the problem here.

steve-mavens avatar Sep 26 '24 14:09 steve-mavens

Python's type system does not provide a way to discriminate between a regular float and a NaN float. I would argue that the current implementation does the most intuitive thing in most cases at the expense of a relatively rare use case. Anecdotally, I have found None, np.nan, and pd.NA to be more common.

We could update to include the Literal[float('NaN')] in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regular float.

gandhis1 avatar Sep 26 '24 14:09 gandhis1

Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.

steve-mavens avatar Sep 26 '24 14:09 steve-mavens

Thanks for the report. I'm not sure what we can do here, because we can't specify float("nan") or np.nan as a Literal.

Dr-Irv avatar Sep 26 '24 14:09 Dr-Irv

One workaround for users would be math.isnan(x) if isinstance(x, float) else pd.isna(x), but obviously there's some performance implications there. It's probably better to suppress the unreachable warning than take the hit.

steve-mavens avatar Sep 26 '24 14:09 steve-mavens

We could update to include the Literal[float('NaN')] in the overloads, but (1) I'm not sure an instantiation like that is actually supported and (2) it may not even solve the issue, if type checkers infer it as a regular float.

I thought that might work, but the typing system doesn't support it. Found this discussion: https://github.com/python/typing/issues/1160

Yes, I think the basic problem here is that TypeIs requires a clean division of types. It doesn't allow for a function that takes input type A | B | C and returns True for all A and some values of C, and False for all B and the remaining values of C.

Well, it might work if we could do Literal[float("nan")], since that would "divide" the types, although I'm not sure if the type checkers are handling that.

Dr-Irv avatar Sep 26 '24 14:09 Dr-Irv

Also, there are allowed to be float NaN values distinct from float('nan'), so it could still go wrong in even rarer cases.

Is there a way to import pandas.api.typing.NAType (or pandas.libs._missing.NAType) that the stubs approve of? The reason I found this quirk of the type hint is actually because of a bug in my real code. I have a variable taken from an unpacked row of a DataFrame, and I typed it as float, when it could also be pd.NA.

steve-mavens avatar Sep 26 '24 14:09 steve-mavens

So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.

steve-mavens avatar Sep 26 '24 14:09 steve-mavens

Is there a way to import pandas.api.typing.NAType (or pandas.libs._missing.NAType) that the stubs approve of?

Right now, if you do from pandas._libs.missing import NAType , that should work.

We're behind on updating the stubs for all 2.x support, which means that pandas.api.typing doesn't exist yet in the stubs.

Dr-Irv avatar Sep 26 '24 15:09 Dr-Irv

Great, thanks for your help. I'll probably use None until NAType is public in the stubs, since either way I just need to tell mypy that pd.isna might return True.

steve-mavens avatar Sep 26 '24 15:09 steve-mavens

So, another workaround is to make sure variables/expressions that might contain a float NaN are typed as float | NAType, or float | None, so that mypy doesn't think the return value of pd.isna is guaranteed False by the input type.

I ran into the same issue today, and resolved it by typing my variable like value: float | None

gboeing avatar Sep 27 '24 17:09 gboeing