siuba icon indicating copy to clipboard operation
siuba copied to clipboard

Add type annotations to siuba verbs and improve type compliance

Open sotte opened this issue 3 years ago • 2 comments

First of all, I really like siuba and I use it a lot :pray:

siuba would be even better, if it a) had type annotations for the core verbs and b) type checkers like pyright/mypy/etc would not complain so much. (I tend to turn off type checking for my siuba code blocks.)

This ticket is related to https://github.com/machow/siuba/issues/418


Note: siuba uses decorators and singledispatch quite a bit. Sadly, getting type annotations right with decorators can be a bit painful in my experience :/

sotte avatar Sep 24 '22 08:09 sotte

I just played with overload and singledispatch. This is mostly a note to myself, and maybe it's helpful for others.

The combination of singledispatch and overload is tricky and mypy and pyright react different.

Here is a minimal example (still quite verbose) with the right type annotations. The implementation is not as nice as singledispatch2/verb_dispatch though.

from functools import singledispatch
from typing import overload

from typing_extensions import reveal_type
import pandas as pd


@singledispatch
def _mutate(x: None | int | pd.DataFrame | pd.Series) -> None | int | pd.DataFrame:
    raise TypeError(type(x))


@_mutate.register
def _(x: None) -> None:
    return x


@_mutate.register
def _(x: int) -> int:
    return x + 1


@_mutate.register
def _(x: pd.DataFrame) -> pd.DataFrame:
    return x


@_mutate.register
def _(x: pd.Series) -> pd.DataFrame:
    return x.to_frame()


@overload
def mutate(x: None) -> None: ...
@overload
def mutate(x: int) -> int: ...
@overload
def mutate(x: pd.Series) -> pd.DataFrame: ...
@overload
def mutate(x: pd.DataFrame) -> pd.DataFrame: ...


def mutate(x: None | int | pd.DataFrame | pd.Series) -> None | int | pd.DataFrame:
    return _mutate(x)


if __name__ == "__main__":
    res = mutate(None)
    reveal_type(res)

    res = mutate(15)
    reveal_type(res)

    data = pd.DataFrame({})
    # assert isinstance(data, pd.DataFrame)
    res = mutate(data)
    reveal_type(res)

    data = pd.Series(data=[1.0], index=[0])
    res = mutate(data)
    reveal_type(res)

This works with pyright

pyright baz.py
...
pyright 1.1.272
/home/stefan/tmp/advanced_typing/baz.py
  /home/stefan/tmp/advanced_typing/baz.py:49:17 - information: Type of "res" is "None"
  /home/stefan/tmp/advanced_typing/baz.py:52:17 - information: Type of "res" is "int"
  /home/stefan/tmp/advanced_typing/baz.py:57:17 - information: Type of "res" is "DataFrame"
  /home/stefan/tmp/advanced_typing/baz.py:61:17 - information: Type of "res" is "DataFrame"
0 errors, 0 warnings, 4 informations
Completed in 1.32sec

but mypy complaints

mypy baz.py
baz.py:49: note: Revealed type is "None"
baz.py:52: note: Revealed type is "builtins.int"
baz.py:56: error: Incompatible types in assignment (expression has type "DataFrame", variable has type "Optional[int]")
baz.py:57: note: Revealed type is "builtins.int"
baz.py:59: error: Incompatible types in assignment (expression has type "Series[Any]", variable has type "DataFrame")
baz.py:60: error: Incompatible types in assignment (expression has type "DataFrame", variable has type "Optional[int]")
baz.py:61: note: Revealed type is "builtins.int"
Found 3 errors in 1 file (checked 1 source file)

I guess in a perfect world one would like a decorator @siuba_verb that combines the overload and the singledispatch.

sotte avatar Sep 24 '22 12:09 sotte

Thanks for the nudge and overload example! I'll add overloads to at least resolve #418.

There are some very relevant mypy issues open for better support singledispatch support:

  • Extended discussion in: https://github.com/python/mypy/issues/11727
  • See options 3 & 5 in this comment: https://github.com/python/mypy/issues/2922#issuecomment-283038234

I think the issue with overload is that it needs to be with the initial function declaration, but singledispatch.register can happen across modules / libraries. From the issues though, it seems like there many be a few reasonable solutions..

machow avatar Sep 25 '22 14:09 machow