pandera
pandera copied to clipboard
`SchemaFieldNotFoundError` with custom check function if no alias is provided.
Describe the bug
- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [x] (optional) I have confirmed this bug exists on the main branch of pandera.
Code Sample, a copy-pastable example
I'm trying to validate a polars dataframe using a custom check function.
import polars as pl
import pandera.polars as pa
# Custom check function
def check_custom_condition(df: pa.PolarsData) -> pl.DataFrame:
return df.lazyframe.select(
pl.when(
pl.col("column1").is_null()
& pl.col(df.key).is_null()
)
.then(False)
.otherwise(True)
# .alias("check_result") # Uncomment this line to avoid the issue
)
# Define the schema for the DataFrame
schema = pa.DataFrameSchema({
"column1": pa.Column(
dtype=str,
nullable=True,
),
"column2": pa.Column(
dtype=str,
nullable=True,
checks=[
pa.Check(check_fn=check_custom_condition),
],
),
})
# Example DataFrame
data = {
"column1": [None, "x", "y"],
"column2": ["a", None, "c"]
}
df = pl.DataFrame(data)
# Validate the DataFrame using the schema and custom check
schema.validate(df, lazy=True)
The example above produces the following error:
{
"DATA": {
"CHECK_ERROR": [
{
"schema": null,
"column": "column2",
"check": "check_custom_condition",
"error": "SchemaFieldNotFoundError(\"literal\")"
}
]
}
}
Expected behavior
I would expect the schema validation to run successfully here. When we uncomment the .alias("check_result")
line, the schema validation runs without error. I'm trying to understand if this behavior is expected, or if this is a bug.
Desktop (please complete the following information):
- OS: Windows 10 & Ubuntu 22.04.4
- Browser: Chrome
- Version:
pandera==0.19.3