pandas-stubs icon indicating copy to clipboard operation
pandas-stubs copied to clipboard

Implement ExtensionArray _accumulate and _reduce

Open MichaelTiemannOSC opened this issue 1 year ago • 5 comments

Describe the bug The stubs for ExtensionArray (in pandas-stubs/core/arrays/base.pyi) does not provide type signatures for _accumulate and _reduce. To properly add typing information to the Pint-Pandas project, these need to be defined.

To Reproduce

  1. Minimal Runnable Example:
import numpy as np
import pandas as pd
from typing import reveal_type
from pandas.arrays import IntegerArray
from pandas.api.extensions import ExtensionArray

_data: ExtensionArray = IntegerArray(values=np.array([1, 2, 3], dtype=int), mask=np.array([True, True, True], dtype=bool))
if isinstance(_data, ExtensionArray):
    reveal_type(_data)
    reveal_type(_data._accumulate)
    reveal_type(_data._reduce)
  1. Using mypy
  2. Show the error message received from that type checker while checking your example.
(pint-dev) % pre-commit run mypy --files foo.py
mypy.....................................................................Failed
- hook id: mypy
- duration: 1.41s
- exit code: 1

foo.py:9: note: Revealed type is "pandas.core.arrays.base.ExtensionArray"
foo.py:10: error: "ExtensionArray" has no attribute "_accumulate"  [attr-defined]
foo.py:10: note: Revealed type is "Any"
foo.py:11: error: "ExtensionArray" has no attribute "_reduce"  [attr-defined]
foo.py:11: note: Revealed type is "Any"
Found 2 errors in 1 file (checked 1 source file)

Note that running the script in python works, because it uses actual Pandas code, not Pandas-Stubs:

(pint-dev) % python foo.py
Runtime type is 'IntegerArray'
Runtime type is 'method'
Runtime type is 'method'

Please complete the following information:

  • OS: Mac OS
  • OS Version 14.1.2
  • python 3.11.4
  • mypy 1.8.0
  • version of installed pandas-stubs: 2.1.4.231227

Additional context Add any other context about the problem here.

MichaelTiemannOSC avatar Jan 13 '24 22:01 MichaelTiemannOSC

While they look very much private, they are documented: https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._accumulate.html https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._reduce.html and could therefore probably be added to pandas-stubs? @Dr-Irv

twoertwein avatar Jan 14 '24 03:01 twoertwein

While they look very much private, they are documented: https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._accumulate.html https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._reduce.html and could therefore probably be added to pandas-stubs? @Dr-Irv

Agreed. PR with tests welcome

Dr-Irv avatar Jan 15 '24 15:01 Dr-Irv

I'm glad to see its a simple case, but alas, it's just beyond my level of python and mypy type algebras.

MichaelTiemannOSC avatar Jan 15 '24 19:01 MichaelTiemannOSC

I can not see any ExtensionArray specific test. @Dr-Irv can you advise on where they should be located ?

mutricyl avatar Apr 30 '24 11:04 mutricyl

I can not see any ExtensionArray specific test. @Dr-Irv can you advise on where they should be located ?

I would add something to test_extension.py, but you can just add a test that asserts the types of _reduce() and _accumulate() to be Callable with appropriate arguments and return types.

Dr-Irv avatar Apr 30 '24 13:04 Dr-Irv

I have added in core/arrays/base.pyi

    def _reduce(self, name: str, *, skipna: bool=..., keepdims: bool=... , **kwargs) -> Scalar: ...
    def _accumulate(self, name: str, *, skipna: bool=..., **kwargs) -> Self: ...

But now I am facing issues with tests:

  • I am struggling assert-type of Callable with multiple arguments (including optional ones and kwargs). I can not find exemples where this is tested. mypy and pyright looks like to deals with arguments in slightly different ways.
    • mypy: error: Expression is of type "Callable[[str, DefaultNamedArg(bool, 'skipna'), DefaultNamedArg(bool, 'keepdims'), KwArg(Any)], str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex]", not "Callable[[], str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex]" [assert-type]
  • pyright also complains about https://github.com/pandas-dev/pandas-stubs/blob/feebd4707e594fd1ba7d30fd54e38885872d5ddd/tests/extension/decimal/array.py#L248 where _reduce is also defined for sub class of ExtensionArray

not sure about the good first issue tag 😃

mutricyl avatar May 14 '24 10:05 mutricyl

I had another recent case in dealing with Callable with odd arguments, and I think it will be hard to do the assert_type() based on what I've learned.

I'm fine if we don't include a test for this, and just add the declarations for the 2 functions.

As for the _reduce() issue with pyright, for extension arrays, the _reduce() operation could return an object of the dtype of the extension array, which could be anything, so use this instead:

    def _reduce(self, name: str, *, skipna: bool=..., keepdims: bool=... , **kwargs) -> object: ...

You may have to change tests/extension/decimal/array.py to return decimal.Decimal for _reduce() in there.

Agree this is not a good first issue any more, but I think you can do it!

Dr-Irv avatar May 14 '24 11:05 Dr-Irv