cudf
cudf copied to clipboard
Return zero when floor dividing an integer data by zero
Closes https://github.com/rapidsai/cudf/issues/7389
@brandon-b-miller it looks like there's one code path somewhere that is (potentially erroneously) relying on the old behavior.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
branch-22.10@e431440
). Click here to learn what that means. The diff coverage isn/a
.
:exclamation: Current head e500fe5 differs from pull request most recent head b2b22c8. Consider uploading reports for the commit b2b22c8 to get more accurate results
@@ Coverage Diff @@
## branch-22.10 #11441 +/- ##
===============================================
Coverage ? 86.41%
===============================================
Files ? 145
Lines ? 22975
Branches ? 0
===============================================
Hits ? 19853
Misses ? 3122
Partials ? 0
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Appears on the pandas side, there's some agreement that pandas nullable-int dtype should match the pandas non-nullable int dtype for this operation, although it differs from numpy: https://github.com/pandas-dev/pandas/issues/48223
In [1]: pd.__version__
Out[1]: '1.6.0.dev0+3.g1f41ff07d1'
In [2]: 1 // pd.Series([0], dtype='Int64') # changed in 1.5
Out[2]:
0 inf
dtype: Float64
In [3]: 1 // pd.Series([0], dtype='int64')
Out[3]:
0 inf
dtype: float64
In [4]: 1 // np.array([0], dtype=np.int64)
<ipython-input-4-9d5b9660db55>:1: RuntimeWarning: divide by zero encountered in floor_divide
1 // np.array([0], dtype=np.int64)
Out[4]: array([0])
But as mentioned in https://github.com/rapidsai/cudf/issues/7389#issuecomment-1224880765, if cuDF would like to implement a more realistic (from a math perspective) result from this operation, IMO it wouldn't be a bad thing.
I'm completely happy with casting to float and returning inf
, especially if that's where pandas is headed as well. The only question I have is can we avoid the data scan - @mroeschke in pandas >1.5
, does it specifically check for zero and then cast, or would we get Float64
for this operation too? I am guessing it ends up int
matching the non nullable behavior meaning we would still need to post-process.
1 // pd.Series([2], dtype='Int64')
In the 1 / 0 case, pandas has special logic to take the 1 / 0 = 0
result from numpy and replace it with inf
https://github.com/pandas-dev/pandas/blob/e0cf2645095a5164ea7a7b143097bf0051f11481/pandas/core/ops/missing.py#L130
For non divide by zero results, the nullable types (should) match the non-nullable types (and numpy) and return int
In [1]: 1 // pd.Series([2], dtype='Int64')
Out[1]:
0 0
dtype: Int64
In [2]: 1 // pd.Series([2], dtype=np.int64)
Out[2]:
0 0
dtype: int64
In [4]: 1 // np.array([2])
Out[4]: array([0])
This PR has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d
if there is no activity in the next 60 days.
#12074 fixes this to match modern panda (and would close this instead)
PR is superseded by https://github.com/rapidsai/cudf/pull/12074