cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Return zero when floor dividing an integer data by zero

Open brandon-b-miller opened this issue 2 years ago • 2 comments

Closes https://github.com/rapidsai/cudf/issues/7389

brandon-b-miller avatar Aug 02 '22 19:08 brandon-b-miller

@brandon-b-miller it looks like there's one code path somewhere that is (potentially erroneously) relying on the old behavior.

vyasr avatar Aug 04 '22 18:08 vyasr

Codecov Report

:exclamation: No coverage uploaded for pull request base (branch-22.10@e431440). Click here to learn what that means. The diff coverage is n/a.

:exclamation: Current head e500fe5 differs from pull request most recent head b2b22c8. Consider uploading reports for the commit b2b22c8 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10   #11441   +/-   ##
===============================================
  Coverage                ?   86.41%           
===============================================
  Files                   ?      145           
  Lines                   ?    22975           
  Branches                ?        0           
===============================================
  Hits                    ?    19853           
  Misses                  ?     3122           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 08 '22 19:08 codecov[bot]

Appears on the pandas side, there's some agreement that pandas nullable-int dtype should match the pandas non-nullable int dtype for this operation, although it differs from numpy: https://github.com/pandas-dev/pandas/issues/48223

In [1]: pd.__version__
Out[1]: '1.6.0.dev0+3.g1f41ff07d1'

In [2]: 1 // pd.Series([0], dtype='Int64')  # changed in 1.5
Out[2]:
0    inf
dtype: Float64

In [3]: 1 // pd.Series([0], dtype='int64')
Out[3]:
0    inf
dtype: float64

In [4]: 1 // np.array([0], dtype=np.int64)
<ipython-input-4-9d5b9660db55>:1: RuntimeWarning: divide by zero encountered in floor_divide
  1 // np.array([0], dtype=np.int64)
Out[4]: array([0])

But as mentioned in https://github.com/rapidsai/cudf/issues/7389#issuecomment-1224880765, if cuDF would like to implement a more realistic (from a math perspective) result from this operation, IMO it wouldn't be a bad thing.

mroeschke avatar Aug 24 '22 21:08 mroeschke

I'm completely happy with casting to float and returning inf, especially if that's where pandas is headed as well. The only question I have is can we avoid the data scan - @mroeschke in pandas >1.5, does it specifically check for zero and then cast, or would we get Float64 for this operation too? I am guessing it ends up int matching the non nullable behavior meaning we would still need to post-process.

1 // pd.Series([2], dtype='Int64')

brandon-b-miller avatar Aug 24 '22 22:08 brandon-b-miller

In the 1 / 0 case, pandas has special logic to take the 1 / 0 = 0 result from numpy and replace it with inf

https://github.com/pandas-dev/pandas/blob/e0cf2645095a5164ea7a7b143097bf0051f11481/pandas/core/ops/missing.py#L130

For non divide by zero results, the nullable types (should) match the non-nullable types (and numpy) and return int

In [1]: 1 // pd.Series([2], dtype='Int64')
Out[1]:
0    0
dtype: Int64

In [2]: 1 // pd.Series([2], dtype=np.int64)
Out[2]:
0    0
dtype: int64

In [4]: 1 // np.array([2])
Out[4]: array([0])

mroeschke avatar Aug 25 '22 23:08 mroeschke

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 25 '22 00:09 github-actions[bot]

#12074 fixes this to match modern panda (and would close this instead)

wence- avatar Nov 10 '22 17:11 wence-

PR is superseded by https://github.com/rapidsai/cudf/pull/12074

quasiben avatar Nov 16 '22 18:11 quasiben