polars icon indicating copy to clipboard operation
polars copied to clipboard

% operator is inconsistent with Python / Numpy behaviour

Open thomasaarholt opened this issue 3 years ago • 4 comments
trafficstars

What language are you using?

Python

Have you tried latest version of polars?

yes

What version of polars are you using?

0.13.50

What operating system are you using polars on?

Mac OS 12.3.1

What language version are you using

Python 3.10

Describe your bug.

In Rust, (and C, C++, D, C#, F# and Java), the % operator is the remainder. In Python (and Perl and Ruby), % is the modulus.

# data
arr = np.arange(-5, 5)
>>> arr
array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

# polars
>>> pl.DataFrame({"a": arr}).with_column(pl.col("a") % 5).to_series().to_numpy()
array([ 0, -4, -3, -2, -1,  0,  1,  2,  3,  4])

# numpy
>>> arr % 5
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

# python
>>> [i % 5 for i in arr]
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

I would expect the python-style behaviour to be the one used by %, with perhaps the introduction of a pl.remainder.

thomasaarholt avatar Jun 27 '22 10:06 thomasaarholt

Having read up a bit about modulus on wikipedia, I am now a bit confused about what the strict definition of modulus is. Still, I'd expect the same behaviour of % in polars as in numpy.

thomasaarholt avatar Jun 27 '22 11:06 thomasaarholt

TL;DR.

Whatever you do, make sure that

# Python
assert((n//m)*m + (n%m) == n)

Details:

Modulo '%' is closely related to integer division. Typically, integer division comes first, then the modulo (remainder) is defined from the invariant (n/m)*m + n%m == n. C,Rust,C++ may use different integer division rules (can even be architecture dependent) from Python. Hence the impact on '%'. In any case, polars should maintain the invariant (I have not checked it yet).

slonik-az avatar Jun 27 '22 11:06 slonik-az

This seems portable in current Rust

https://doc.rust-lang.org/stable/std/primitive.i32.html#method.rem_euclid

rostamn739 avatar Jun 29 '22 12:06 rostamn739

Again, it makes no sense to change '%' without simultaneously changing integer division (https://doc.rust-lang.org/stable/std/primitive.i32.html#method.div_euclid).

Euclid algo is much slower than the native hardware implementation which goes contrary to polars stated goals.

slonik-az avatar Jun 29 '22 12:06 slonik-az

I will close this as the implementation language is Rust and we follow that behavior in this respect.

ritchie46 avatar Oct 21 '22 18:10 ritchie46