polars icon indicating copy to clipboard operation
polars copied to clipboard

Lazily evaluated error expression

Open TimonKnigge opened this issue 1 year ago • 1 comments

Description

There's often situations where I want to do some computation on a LazyFrame, and conditionally throw an error, but I don't actually want to collect the LazyFrame yet. I'm curious if we can have some kind of lazily evaluated error expression that only raises when it actually tries to be instantiated.

For example

assert isinstance(df, pl.LazyFrame)
print(df.schema)
>>> OrderedDict([('x', Int64), ('y', Int64)])

df = df.with_columns(
    pl.when(pl.col('y') != 0)
    .then(pl.col('x') / pl.col('y'))
    .otherwise(pl.raise("Division by zero"))
)
# (nothing happens yet)

df = df.collect()
# Iff there are any zeros in column 'y':
>>> ComputeError: encountered error 'Division by zero'

Just to stress: I understand the above division produces inf, but this is just an example, the point is there are other computations I may want to do, with business logic that isn't really captured by the typing system.

TimonKnigge avatar Mar 20 '24 13:03 TimonKnigge

I like this idea, but unfortunately the when/then architecture doesn't allow for this: when you supply a when/then chain, all columns are computed in parallel, and then filtered.

In some cases you can use Expr.map_batches or Expr.map_elements:

import polars as pl

# business_threshold = 5
business_threshold = 2

def raise_if_too_high(s):
    if (s > business_threshold).any():
        raise ValueError("My business logic doesn't like this.")
    return s

df = pl.DataFrame({"a": [1, 2, 3]})

df.select(
    pl.col("a").map_batches(lambda s: raise_if_too_high(s))
)
polars.exceptions.ComputeError: ValueError: My business logic doesn't like this.

Note that if we set business_threshold to 5 then no error is raised.

mcrumiller avatar Mar 20 '24 17:03 mcrumiller

Your example won't work due to the way when/then/otherwise works.

The following issue is related to your request, I will close this in favor of that one: https://github.com/pola-rs/polars/issues/11064

stinodego avatar Mar 21 '24 10:03 stinodego