cudf [BUG] Pymod operation is missing `null` values

Describe the bug When there is 0 in denominator of pymod operation, we seem to be returning garbage value instead of null value.

Steps/Code to reproduce bug

Type "help", "copyright", "credits" or "license" for more information.
>>> 1 % 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> import pandas as pd
>>> x = pd.Series([1, 2, 3, 4, 0])
>>> x  % 2
0    1
1    0
2    1
3    0
4    0
dtype: int64
>>> 2 % x
0    0.0
1    0.0
2    2.0
3    2.0
4    NaN
dtype: float64

>>> import cudf
>>> x = cudf.from_pandas(x)
>>> x % 2
0    1
1    0
2    1
3    0
4    0
dtype: int64
>>> 2 % x
0             0
1             0
2             2
3             2
4    4294967295   #<- This could either be `nan` or `null`
dtype: int64

Expected behavior Return either nan or null inplace of garbage values

Environment overview (please complete the following information)

Environment location: Docker
Method of cuDF install: from source (0.15)

Additional context This kind of looks similar to https://github.com/rapidsai/cudf/issues/5722 but not sure if they share the same code-flows, so Just putting it out there.

Aug 11 '20 22:08 galipremsagar

I guess it makes sense for a modulo zero operation to return null since it's undefined behavior but this is somewhere we would differ from Pandas.

Aug 12 '20 00:08 kkraus14

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Feb 16 '21 21:02 github-actions[bot]

cc @brandon-b-miller for more binaryop null things

Feb 24 '21 21:02 kkraus14

Binary operations in libcudf are designed so that nulls always propagate. As a result, although we have created a specialized mod operator PYMOD in libcudf for compatibility with Python's rules for handling negatives, we can't trivially exploit that for null handling since the operator doesn't return a nullable value.

However, in this instance there is some precedent for solving this at the libcudf level. Specifically, all the Null* operators have special handling to bypass null propagation in the wrapper for ops. While I would normally say that we should solve this issue by post-processing the result in Python (similar to how #11441 solves #7389), the existing infrastructure for solving this problem makes me less wary of introducing this logic into C++.

@jrhemstad I'd be curious what you think about this. The tl;dr is that in pandas x % 0 returns nulls, and we're trying to figure out how best to do that. Options are (1) using the same wrapping logic that we currently use in libcudf for the Null* operators, which I find slightly icky (but only slightly since we're already doing this, it's just a little less obvious why you'd do this for an operator that isn't strictly about specialized null handling) or (2) just post-processing the result in Python based on whether the input values contained zeros.

CC @brandon-b-miller

Aug 04 '22 18:08 vyasr

although we have created a specialized mod operator PYMOD in libcudf for compatibility with Python's rules for handling negatives

"PYMOD" is a misnomer. It wasn't intended to be 100% compliant with Python's %. It was intended to just be the mathematical modulo operation vs C++'s % is the remainder operation.

See: https://stackoverflow.com/questions/13683563/whats-the-difference-between-mod-and-remainder

I wanted to rename MOD to REMAINDER and PYMOD would be MODULO.

While I would normally say that we should solve this issue by post-processing the result in Python

This is the correct answer. It's no different than https://github.com/rapidsai/cudf/issues/7389#issuecomment-1194572293 imo.

Aug 04 '22 19:08 jrhemstad

Found the original conversation: https://github.com/rapidsai/cudf/pull/1985#pullrequestreview-248424619

What's worse is calling an operation PyMod. It doesn't tell a user of the C++ interface what it actually means. In general, we should use descriptions that stand on their own and do not resort to knowledge of another language or library.

I still agree with my past self :)

Aug 04 '22 19:08 jrhemstad

I'm fine with that explanation, although to @harrism's point we'd definitely need to document how we define remainder vs modulo since it's at best it's a nonstandard but common distinction. So perhaps we should do that rename while also addressing this issue here :)

I think resolving this issue then requires two things:

Rename PYMOD to REMAINDER (along with suitable documentation of the meaning)
Add the same kind of logic as #11441 except for modulo.

@brandon-b-miller would be up for handling this? I'm hoping it would be a near carbon copy of #11441 except with a different fill value. I can take on 1 myself if you don't want to.

Aug 04 '22 19:08 vyasr

happy to handle making the change 👍

Aug 04 '22 19:08 brandon-b-miller