Closes https://github.com/rapidsai/cudf/issues/7389

Aug 02 '22 19:08 brandon-b-miller

@brandon-b-miller it looks like there's one code path somewhere that is (potentially erroneously) relying on the old behavior.

Aug 04 '22 18:08 vyasr

Codecov Report

:exclamation: No coverage uploaded for pull request base (branch-22.10@e431440). Click here to learn what that means. The diff coverage is n/a.

:exclamation: Current head e500fe5 differs from pull request most recent head b2b22c8. Consider uploading reports for the commit b2b22c8 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10   #11441   +/-   ##
===============================================
  Coverage                ?   86.41%           
===============================================
  Files                   ?      145           
  Lines                   ?    22975           
  Branches                ?        0           
===============================================
  Hits                    ?    19853           
  Misses                  ?     3122           
  Partials                ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Aug 08 '22 19:08 codecov[bot]

Appears on the pandas side, there's some agreement that pandas nullable-int dtype should match the pandas non-nullable int dtype for this operation, although it differs from numpy: https://github.com/pandas-dev/pandas/issues/48223

In [1]: pd.__version__
Out[1]: '1.6.0.dev0+3.g1f41ff07d1'

In [2]: 1 // pd.Series([0], dtype='Int64')  # changed in 1.5
Out[2]:
0    inf
dtype: Float64

In [3]: 1 // pd.Series([0], dtype='int64')
Out[3]:
0    inf
dtype: float64

In [4]: 1 // np.array([0], dtype=np.int64)
<ipython-input-4-9d5b9660db55>:1: RuntimeWarning: divide by zero encountered in floor_divide
  1 // np.array([0], dtype=np.int64)
Out[4]: array([0])

But as mentioned in https://github.com/rapidsai/cudf/issues/7389#issuecomment-1224880765, if cuDF would like to implement a more realistic (from a math perspective) result from this operation, IMO it wouldn't be a bad thing.

Aug 24 '22 21:08 mroeschke

I'm completely happy with casting to float and returning inf, especially if that's where pandas is headed as well. The only question I have is can we avoid the data scan - @mroeschke in pandas >1.5, does it specifically check for zero and then cast, or would we get Float64 for this operation too? I am guessing it ends up int matching the non nullable behavior meaning we would still need to post-process.

1 // pd.Series([2], dtype='Int64')

Aug 24 '22 22:08 brandon-b-miller

In the 1 / 0 case, pandas has special logic to take the 1 / 0 = 0 result from numpy and replace it with inf

https://github.com/pandas-dev/pandas/blob/e0cf2645095a5164ea7a7b143097bf0051f11481/pandas/core/ops/missing.py#L130

For non divide by zero results, the nullable types (should) match the non-nullable types (and numpy) and return int

In [1]: 1 // pd.Series([2], dtype='Int64')
Out[1]:
0    0
dtype: Int64

In [2]: 1 // pd.Series([2], dtype=np.int64)
Out[2]:
0    0
dtype: int64

In [4]: 1 // np.array([2])
Out[4]: array([0])

Aug 25 '22 23:08 mroeschke

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sep 25 '22 00:09 github-actions[bot]

#12074 fixes this to match modern panda (and would close this instead)

Nov 10 '22 17:11 wence-

PR is superseded by https://github.com/rapidsai/cudf/pull/12074

Nov 16 '22 18:11 quasiben

cudf
cudf copied to clipboard

Return zero when floor dividing an integer data by zero

Codecov Report

cudf cudf copied to clipboard

Return zero when floor dividing an integer data by zero

Codecov Report

cudf
cudf copied to clipboard