polars icon indicating copy to clipboard operation
polars copied to clipboard

pl.lit does not use the correct signed integer remainder

Open eitsupi opened this issue 1 year ago • 5 comments

However this also means we must update integer division to be consistent with it. The identity (a / b) * b + (a % b) == a must always hold.

If I am correct, I don't think floor_div (floordiv) and rem (mod) in polars currently satisfy this. Wouldn't this be grounds for changing the behavior of the mod?

>>> x = -1
>>> y = 3
>>> x == (x % y) + y * (x // y)
True
>>> import polars as pl
>>> x_pl = pl.lit(x)
>>> y_pl = pl.lit(y)
>>> pl.select(x_pl == (x_pl % y_pl) + y_pl * (x_pl // y_pl))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ bool    │
╞═════════╡
│ false   │
└─────────┘
>>> pl.select(x == (x % y) + y * (x // y))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ bool    │
╞═════════╡
│ true    │
└─────────┘

Originally posted by @eitsupi in https://github.com/pola-rs/polars/issues/10570#issuecomment-1818933911

eitsupi avatar Feb 23 '24 04:02 eitsupi

Yeah I don't know. A much simpler example and description of the difference is that python's modulo operator always returns a result whose sign is equal to the denominator:

>>> -1 % 3
2
>>> 2 % -3
-1

Polars, on the other hand, appears to return a result whose sign is equal to the numerator:

>>> pl.select(pl.lit(-1) % pl.lit(3))  # -1
>>> pl.select(pl.lit(2) % pl.lit(-3))  # 2

Both are mathematically correct, it's just a choice of signage.

mcrumiller avatar Feb 23 '24 14:02 mcrumiller

@orlp FYI.

ritchie46 avatar Feb 23 '24 22:02 ritchie46

Both are mathematically correct, it's just a choice of signage.

It sounds a bit weird; integer division and modulus are in one set, and no programming language mixes them in a way that breaks the PR title equation.

C and rust follows the IEEE standard integer division rule (called truncated div or div towards zero) and python follows the floor division rule (as can be guessed from the __floordiv__ op name):

// in c - IEEE trunc div
int a = 5;
int b = -3;
printf("%d\n", a / b);
printf("%d\n", a % b);
printf("%d\n", (a / b) * b + (a % b));
// -1
// 2
// 5
// in rust - IEEE trunc div
let a: i32 = 5;
let b: i32 = -3;
print!("{}\n", a / b);
print!("{}\n", a % b);
print!("{}\n", (a / b) * b + (a % b));
// -1
// 2
// 5
# in python - floordiv
a = 5
b = -3
print(a // b)
print(a % b)
print((a / b) * b + (a % b))
# -2
# -1
# 5
# in polars - floordiv for quotient, truncdiv for remainder
a = 5
b = -3
print(pl.select(pl.lit(a) // pl.lit(b))[0, 0])
print(pl.select(pl.lit(a) % pl.lit(b))[0, 0])
print(pl.select((pl.lit(a) // pl.lit(b)) * pl.lit(b) + (pl.lit(a) % pl.lit(b)))[0, 0])
# -2
# 2
# 8

As a rust-backed python dataframe library, I feel like choosing either intdiv convention makes sense, but not the current mixture.

cjackal avatar Feb 24 '24 06:02 cjackal

This is a bug, but perhaps not in the way you think. The pl.lit constant-folding modulo doesn't map to the correct operator, I forgot to update it when I changed the other arithmetic.

Series do have the correct behavior:

>>> pl.Series([-1]) % pl.Series([3])
shape: (1,)
Series: '' [i64]
[
        2
]
>>> pl.Series([-1]) % 3
shape: (1,)
Series: '' [i64]
[
        2
]

If there are any other places that also are bugged, please do tell!

orlp avatar Feb 24 '24 10:02 orlp

It appears that -1 % pl.Series([3]) also does not map to the correct operator, same with -1 // pl.Series([3]).

orlp avatar Feb 24 '24 10:02 orlp