polars
polars copied to clipboard
% operator is inconsistent with Python / Numpy behaviour
What language are you using?
Python
Have you tried latest version of polars?
yes
What version of polars are you using?
0.13.50
What operating system are you using polars on?
Mac OS 12.3.1
What language version are you using
Python 3.10
Describe your bug.
In Rust, (and C, C++, D, C#, F# and Java), the % operator is the remainder. In Python (and Perl and Ruby), % is the modulus.
# data
arr = np.arange(-5, 5)
>>> arr
array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])
# polars
>>> pl.DataFrame({"a": arr}).with_column(pl.col("a") % 5).to_series().to_numpy()
array([ 0, -4, -3, -2, -1, 0, 1, 2, 3, 4])
# numpy
>>> arr % 5
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])
# python
>>> [i % 5 for i in arr]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
I would expect the python-style behaviour to be the one used by %, with perhaps the introduction of a pl.remainder.
Having read up a bit about modulus on wikipedia, I am now a bit confused about what the strict definition of modulus is. Still, I'd expect the same behaviour of % in polars as in numpy.
TL;DR.
Whatever you do, make sure that
# Python
assert((n//m)*m + (n%m) == n)
Details:
Modulo '%' is closely related to integer division. Typically, integer division comes first, then the modulo (remainder) is defined from the invariant (n/m)*m + n%m == n. C,Rust,C++ may use different integer division rules (can even be architecture dependent) from Python. Hence the impact on '%'. In any case, polars should maintain the invariant (I have not checked it yet).
This seems portable in current Rust
https://doc.rust-lang.org/stable/std/primitive.i32.html#method.rem_euclid
Again, it makes no sense to change '%' without simultaneously changing integer division (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.div_euclid).
Euclid algo is much slower than the native hardware implementation which goes contrary to polars stated goals.
I will close this as the implementation language is Rust and we follow that behavior in this respect.