polars
polars copied to clipboard
Lazily evaluated error expression
Description
There's often situations where I want to do some computation on a LazyFrame, and conditionally throw an error, but I don't actually want to collect the LazyFrame yet. I'm curious if we can have some kind of lazily evaluated error expression that only raises when it actually tries to be instantiated.
For example
assert isinstance(df, pl.LazyFrame)
print(df.schema)
>>> OrderedDict([('x', Int64), ('y', Int64)])
df = df.with_columns(
pl.when(pl.col('y') != 0)
.then(pl.col('x') / pl.col('y'))
.otherwise(pl.raise("Division by zero"))
)
# (nothing happens yet)
df = df.collect()
# Iff there are any zeros in column 'y':
>>> ComputeError: encountered error 'Division by zero'
Just to stress: I understand the above division produces inf, but this is just an example, the point is there are other computations I may want to do, with business logic that isn't really captured by the typing system.
I like this idea, but unfortunately the when/then architecture doesn't allow for this: when you supply a when/then chain, all columns are computed in parallel, and then filtered.
In some cases you can use Expr.map_batches or Expr.map_elements:
import polars as pl
# business_threshold = 5
business_threshold = 2
def raise_if_too_high(s):
if (s > business_threshold).any():
raise ValueError("My business logic doesn't like this.")
return s
df = pl.DataFrame({"a": [1, 2, 3]})
df.select(
pl.col("a").map_batches(lambda s: raise_if_too_high(s))
)
polars.exceptions.ComputeError: ValueError: My business logic doesn't like this.
Note that if we set business_threshold to 5 then no error is raised.
Your example won't work due to the way when/then/otherwise works.
The following issue is related to your request, I will close this in favor of that one: https://github.com/pola-rs/polars/issues/11064