polars icon indicating copy to clipboard operation
polars copied to clipboard

Deep equality of pl.Expr trees

Open OneRaynyDay opened this issue 1 year ago • 3 comments

Describe your feature request

pl.Exprs supply a wide array of operator overloads including __eq__ which is great syntactic sugar to construct more complicated nested predicate expressions. However, by losing __eq__ we no longer have a way to recursively check that pl.Exprs indeed are equal to each other. Semantically:

pl.lit(3) == pl.lit(5)

is a boolean expression. I would like something like a .equals():

pl.lit(3).equals(pl.lit(5))

which is a boolean value in python. Here, it would be False because pl.lit(3) is an expression node with a value of 3 as its member while pl.lit(5) is an expression node with a value of 5 as its member. One can imagine more complicated expressions:

(pl.lit(3) + pl.lit(5)).equals(pl.lit(3) + (pl.lit(1) + pl.lit(4)))

Would yield False although the boolean expression would evaluate to true, since LHS looks like:

add
├─ pl.lit(3)
├─ pl.lit(5)

And RHS looks like:

add
├─ pl.lit(3)
├─ add # This node is different!
       ├─ pl.lit(1)
       ├─ pl.lit(4)

This would be really helpful if we want applications to build upon polars, since those applications should have tests to verify correctness of the polars expressions created underneath.

OneRaynyDay avatar Jul 17 '22 12:07 OneRaynyDay

I am thinking about a Expr.meta -> MetaExpr namespace. This can implement the magic methods on a meta level. E.g. comparing expressions by expression tree. We can also add methods that allow you to modify an existing expression, such as MetaExpr.pop for popping the latest expression of the tree.

ritchie46 avatar Jul 17 '22 14:07 ritchie46

I think this is a good idea. To better understand it, would we be able to convert between Expr and MetaExpr via some_expr.meta() and some_meta_expr.expr()? I would really like the ability to introspect exprs (especially e.g. Fields and what column name they have) so this would be a great addition that solves multiple problems :)

OneRaynyDay avatar Jul 17 '22 19:07 OneRaynyDay

A meta namespace sounds clean; you definitely don't want to mix with root-level Expr->Expr methods.

I solved the same problem for our in-house data DSL (for which polars is a target/engine) by having a dedicated introspection module (a bit like python inspect) so the separation was explicit (more like exprs_equal(e1,e2) than e1.meta.equals(e2)). The namespace concept is very consistent within polars, so a .meta makes a lot of sense.

(One nice feature we have is support for custom visitors, allowing for flexible introspection/rewrites/optimisations of arbitrary expression trees; probably a bit harder to offer something like that here, given that the Expr object really lives down in Rust though? :thinking:).

alexander-beedie avatar Jul 18 '22 05:07 alexander-beedie