pandera icon indicating copy to clipboard operation
pandera copied to clipboard

fix: add List, Dict, Tuple and NamedTuple to the GenericDType bound

Open sam-goodwin opened this issue 1 year ago • 6 comments

Closes https://github.com/unionai-oss/pandera/issues/1555

sam-goodwin avatar Apr 04 '24 17:04 sam-goodwin

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.07%. Comparing base (4df61da) to head (8ab65a5). Report is 75 commits behind head on main.

:exclamation: Current head 8ab65a5 differs from pull request most recent head 9dc8ed5. Consider uploading reports for the commit 9dc8ed5 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1556       +/-   ##
===========================================
- Coverage   94.29%   83.07%   -11.22%     
===========================================
  Files          91      111       +20     
  Lines        7024     8191     +1167     
===========================================
+ Hits         6623     6805      +182     
- Misses        401     1386      +985     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 04 '24 17:04 codecov[bot]

Thanks @sam-goodwin, see https://pandera.readthedocs.io/en/latest/CONTRIBUTING.html#set-up-pre-commit for steps to make sure linters and unit tests are passing. You'll also need to sign your commits: https://pandera.readthedocs.io/en/latest/CONTRIBUTING.html#dco-signing-commits

cosmicBboy avatar Apr 10 '24 16:04 cosmicBboy

Mypy errors:

tests/core/test_typing.py:498: error: "list" is not subscriptable, use "typing.List" instead  [misc]
tests/core/test_typing.py:499: error: "dict" is not subscriptable, use "typing.Dict" instead  [misc]
tests/core/test_typing.py:500: error: "tuple" is not subscriptable, use "typing.Tuple" instead  [misc]

Note that pandera needs to support python 3.8 as well, so we need to use the generic types in the typing module.

Failing unit test:

FAILED tests/core/test_typing.py::test_complex_python_collection_types - pandera.errors.SchemaError: expected series 'list' to have type list[pandera.dtypes.Int32]:
failure cases:
   index failure_case
0      0       [1, 2]
1      1    [3, 4, 5]

Looks like you need to use the built-in int type? pandera.dtypes.Int32 translates to the numpy dtype for pandas columns.

cosmicBboy avatar Apr 19 '24 15:04 cosmicBboy

Looks like you need to use the built-in int type? pandera.dtypes.Int32 translates to the numpy dtype for pandas columns.

Do you mean we can't specify ints with specific precision in a List or Dict in pandera?

sam-goodwin avatar Apr 20 '24 21:04 sam-goodwin

Do you mean we can't specify ints with specific precision in a List or Dict in pandera?

This just follows the way pandas deals with data. Columns containing list or dict objects are just python objects, meaning they're not numpy arrays. This might be different for pyarrow data representations, but that'll be something to tackle when adding pyarrow support https://github.com/unionai-oss/pandera/issues/1262.

In summary, pandera.dtypes.Int32 maps onto a numpy.int32, and a list[numpy.int32] isn't meaningful in the context of pandas. list[int] does tho, and will contain just lists of python ints.

cosmicBboy avatar Apr 24 '24 16:04 cosmicBboy

@sam-goodwin friendly ping: one of the unit tests is still failing: https://github.com/unionai-oss/pandera/actions/runs/8861081819/job/24332580434?pr=1556

cosmicBboy avatar May 10 '24 15:05 cosmicBboy