pandera icon indicating copy to clipboard operation
pandera copied to clipboard

feat: add remaining pyarrow types

Open aaravind100 opened this issue 1 year ago • 3 comments

Add the remaining pyarrow types as per the conversation in #1676.

Below types are added:

  • null
  • date32
  • date64
  • duration
  • float16
  • time32
  • time64
  • map_
  • binary
  • large_binary
  • large_string

Note: null requires field nullable option to be set to True for successful validation.

aaravind100 avatar Jun 29 '24 09:06 aaravind100

Codecov Report

Attention: Patch coverage is 97.91667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 82.77%. Comparing base (812b2a8) to head (e605423). Report is 150 commits behind head on main.

Files with missing lines Patch % Lines
pandera/engines/pandas_engine.py 97.91% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1720       +/-   ##
===========================================
- Coverage   94.28%   82.77%   -11.52%     
===========================================
  Files          91      117       +26     
  Lines        7013     8811     +1798     
===========================================
+ Hits         6612     7293      +681     
- Misses        401     1518     +1117     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jun 29 '24 09:06 codecov[bot]

@cosmicBboy any suggestions for fixing this test?

FAILED tests/core/test_pandas_engine.py::test_pandas_data_type_coerce[ArrowMap] - assert 0 > 0

It does appear to raise a ParserError, but the count is 0.

aaravind100 avatar Jun 29 '24 09:06 aaravind100

@aaravind100 it looks like the underlying error is:

NotImplementedError: Converting strings to map<int64, int64> is not implemented

It looks like this test case needs to be updated to contain a list of dicts where keys are ints and values are strings? https://github.com/unionai-oss/pandera/pull/1720/files#diff-358f62ffd0dea15c0dcd1efc34883303f1d085f35ebb8d3923c781872f6eb563R317-R325

cosmicBboy avatar Jun 30 '24 17:06 cosmicBboy

@cosmicBboy implementing a coerce value method fixed it. It was not able to get the failures cases.

aaravind100 avatar Jun 30 '24 21:06 aaravind100

Rebased and fixed all the failing tests :smile:

There was this weird case where pandas 2.0.3 was sneaking in the nox "tests(extra='core', pydantic='2.3.0', python='3.8', pandas='1.5.3')" session.

https://github.com/unionai-oss/pandera/actions/runs/9735487403/job/26864752599?pr=1720#step:7:133

aaravind100 avatar Jul 01 '24 09:07 aaravind100

thanks @aaravind100 !

cosmicBboy avatar Jul 02 '24 18:07 cosmicBboy