polars [Python] Expose all operators Expr implements as methods

Describe your feature request

Currently, Polars only exposes some operators as methods of Expr. I propose exposing all of them as methods of Expr

Advantages:

consistent docs, all operators have corresponding functions which document the types.
- Being able to find the pow function in docs but none of the other operators caused me no end of confusion.
can be more readable when used in code, especially for longer chains

Additionally, it might make sense to use this as a way to document the operators and what arguments they can take as i was unable to find anything about that.

Currently implemented operators, and their method version if present:

dunder operator	method
`__invert__`	`self.is_not`
`__xor__`
`__rxor__`
`__and__`
`__rand__`
`__or__`
`__ror__`
`__add__`
`__radd__`
`__sub__`
`__rsub__`
`__mul__`
`__rmul__`
`__truediv__`
`__rtruediv__`
`__floordiv__`
`__rfloordiv__`
`__rmod__`
`__mod__`
`__pow__`	`self.pow`
`__rpow__`
`__ge__`	`self.gt_eq`
`__le__`	`self.lt_eq`
`__eq__`	`self.eq`
`__ne__`	`self.neq`
`__lt__`	`self.lt`
`__gt__`	`self.gt`
`__neg__`

Edit: it would probably make sense to do the same for DataFrame and Series operators, not just Expr

Aug 02 '22 13:08 laundmo

This would create a redundancy and would create differences in how users would write polars queries, which I want to keep to a minimum.

I think I will even follow up with deprecationg the comparisson methods as they go against this principle. (I forgot we have those).

Maybe we could document the dunders we implement to make it better discoverable?

Aug 02 '22 13:08 ritchie46

Im really not a fan of the way using operators makes queries look, so while understandable this isn't great to hear.

I dont really think that writing queries different ways would be more of an issue than it already is, after all i can already use the dunders directly to write the same query without operators - they're just ugly.

Aug 02 '22 13:08 laundmo

I dont really think that writing queries different ways would be more of an issue than it already is, after all i can already use the dunders directly to write the same query without operators - they're just ugly.

Yep, but that would be frowned upon.

I feel that it is a matter of taste. I think col(a) + col(b) is more readable than col(a).add(col(b)) so I want to nudge into that direction.

Given my earlier point that I want to keep redundancy in operators to a minimum, I have to choose one.

And then I go for my taste, as I have to look at it most. :)

Aug 03 '22 08:08 ritchie46

Given the many requests for this, I am willing to accept a PR that implements those on the expressions.

Nov 19 '22 11:11 ritchie46

I would also be happy about an implementation. But I can also understand your objection @ritchie46 I think for very simple cases you are right right that col(a) + col(b) is cleaner than col(a).add(col(b)) For more complex applications, which are mostly present in practice, I see it like @laundmo Example:

import polars as pl

df = pl.DataFrame({
    "age": [20, 45, 33, 21, 55],
    "height": [1.80, 1.90, 1.70, 1.65, 1.80],
    "weight": [80, 90, 70, 65, 80],

})

# polars
(
    df
    .filter(
        ((pl.col("age") % 10) != 0) &
        (pl.col("height") > 1.75) &
        (
            ((pl.col("weight") + 10) > 80) |
            (pl.col("weight") < 70)
        )
    )
)

# pandas
(
    df[
        df["age"].mod(10).ne(0) &
        df["height"].gt(1.75) &
        (
            df["weight"].add(10).gt(80) |
            df["weight"].lt(70)
        )
    ]
)

With an implementation, however, we would have to think about the naming. The suggestions above do not correspond to those from pandas/dunder! I think we should stick with the dunder/pandas syntax (ge, gt, eq, ne) instead of gt_eq, gt, eq, neq.

Nov 19 '22 15:11 Julian-J-S

@ritchie46 I think if you have only one expression, it doesn't matter, but it does break patterns of your code when: e.g., this example in the doc:

df.select(
    [
        pl.sum("nrs"),
        pl.col("names").sort(),
        pl.col("names").first().alias("first name"),
        (pl.mean("nrs") * 10).alias("10xnrs"),
    ]
)

I found the code could be more clear if we can keep the pattern and do this:

df.select(
    [
        pl.sum("nrs"),
        pl.col("names").sort(),
        pl.col("names").first().alias("first name"),
        pl.mean("nrs").mul(10).alias("10xnrs"),
    ]
)

Or maybe have a method .do() to take any math operations. For example, .do(+5), do(*8), so:

df.select(
    [
        pl.sum("nrs"),
        pl.col("names").sort(),
        pl.col("names").first().alias("first name"),
        pl.mean("nrs").do(*10).alias("10xnrs"),
    ]
)

I would also argue that the code could also be a bit messy under the current implementation for a complex calculation since the only way to define order of operations is via (), for example:

df.select(
    [
        pl.sum("nrs"),
        pl.col("names").sort(),
        pl.col("names").first().alias("first name"),
        (((pl.mean("nrs") - pl.col("nrs"))*10 + 10) / (pl.col('nrs')*100)).alias("10xnrs"),
    ]
)

If we could have a div() method, we can at least break it into two parts I guess. I really hope you could please give it a consideration to add some math methods.

Mar 13 '23 02:03 stevenlis

Could you make a PR for this?

Mar 13 '23 08:03 ritchie46

@ritchie46 would love to, but it's really beyond my technical skill. 😅. btw, shouldn't this be tracked by a separate issue? I think the op was asking something different.

Mar 13 '23 13:03 stevenlis

I'll take care of this one.

Mar 20 '23 06:03 alexander-beedie

polars polars copied to clipboard

[Python] Expose all operators Expr implements as methods

Describe your feature request

polars
polars copied to clipboard