pl.lit does not use the correct signed integer remainder
However this also means we must update integer division to be consistent with it. The identity
(a / b) * b + (a % b) == amust always hold.
If I am correct, I don't think floor_div (floordiv) and rem (mod) in polars currently satisfy this. Wouldn't this be grounds for changing the behavior of the mod?
>>> x = -1
>>> y = 3
>>> x == (x % y) + y * (x // y)
True
>>> import polars as pl
>>> x_pl = pl.lit(x)
>>> y_pl = pl.lit(y)
>>> pl.select(x_pl == (x_pl % y_pl) + y_pl * (x_pl // y_pl))
shape: (1, 1)
┌─────────┐
│ literal │
│ --- │
│ bool │
╞═════════╡
│ false │
└─────────┘
>>> pl.select(x == (x % y) + y * (x // y))
shape: (1, 1)
┌─────────┐
│ literal │
│ --- │
│ bool │
╞═════════╡
│ true │
└─────────┘
Originally posted by @eitsupi in https://github.com/pola-rs/polars/issues/10570#issuecomment-1818933911
Yeah I don't know. A much simpler example and description of the difference is that python's modulo operator always returns a result whose sign is equal to the denominator:
>>> -1 % 3
2
>>> 2 % -3
-1
Polars, on the other hand, appears to return a result whose sign is equal to the numerator:
>>> pl.select(pl.lit(-1) % pl.lit(3)) # -1
>>> pl.select(pl.lit(2) % pl.lit(-3)) # 2
Both are mathematically correct, it's just a choice of signage.
@orlp FYI.
Both are mathematically correct, it's just a choice of signage.
It sounds a bit weird; integer division and modulus are in one set, and no programming language mixes them in a way that breaks the PR title equation.
C and rust follows the IEEE standard integer division rule (called truncated div or div towards zero) and python follows the floor division rule (as can be guessed from the __floordiv__ op name):
// in c - IEEE trunc div
int a = 5;
int b = -3;
printf("%d\n", a / b);
printf("%d\n", a % b);
printf("%d\n", (a / b) * b + (a % b));
// -1
// 2
// 5
// in rust - IEEE trunc div
let a: i32 = 5;
let b: i32 = -3;
print!("{}\n", a / b);
print!("{}\n", a % b);
print!("{}\n", (a / b) * b + (a % b));
// -1
// 2
// 5
# in python - floordiv
a = 5
b = -3
print(a // b)
print(a % b)
print((a / b) * b + (a % b))
# -2
# -1
# 5
# in polars - floordiv for quotient, truncdiv for remainder
a = 5
b = -3
print(pl.select(pl.lit(a) // pl.lit(b))[0, 0])
print(pl.select(pl.lit(a) % pl.lit(b))[0, 0])
print(pl.select((pl.lit(a) // pl.lit(b)) * pl.lit(b) + (pl.lit(a) % pl.lit(b)))[0, 0])
# -2
# 2
# 8
As a rust-backed python dataframe library, I feel like choosing either intdiv convention makes sense, but not the current mixture.
This is a bug, but perhaps not in the way you think. The pl.lit constant-folding modulo doesn't map to the correct operator, I forgot to update it when I changed the other arithmetic.
Series do have the correct behavior:
>>> pl.Series([-1]) % pl.Series([3])
shape: (1,)
Series: '' [i64]
[
2
]
>>> pl.Series([-1]) % 3
shape: (1,)
Series: '' [i64]
[
2
]
If there are any other places that also are bugged, please do tell!
It appears that -1 % pl.Series([3]) also does not map to the correct operator, same with -1 // pl.Series([3]).