pandera
pandera copied to clipboard
Incorrect Pandera Polars DataFrameModel Type Coercion Logic
Describe the bug
A clear and concise description of what the bug is.
When defining a Polars DataFrameModel, setting coerce=True for an individual column will result in the entire DataFrame being coerced.
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of pandera.
- [X] (optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandera.polars as pa
import polars as pl
class MyModelSchema(pa.DataFrameModel):
IntegerColumn: int = pa.Field(coerce=True)
FloatColumn: float = pa.Field(coerce=False)
class Config:
coerce = False
if __name__ == "__main__":
data = {"IntegerColumn": [1.5, 2.2, 3.1], "FloatColumn": ["1.0", "2.8", "3"]}
validated_df = MyModelSchema.validate(df)
Expected behavior
I set coerce=True for the IntegerColumn, and explicitly False in the Config class as well as the FloatColumn. I pass string values into the "FloatColumn" data, expecting to get a validation error when validating the data against the DataFrameModel since these values aren't float types. However, the string values of "FloatColumn" do get coerced into floats and the resulting validated_df has the FloatColumn as type f64.
Desktop (please complete the following information):
- OS: Windows 11 Pro
- Version: pandera==0.19.0b0
- Python==3.11.5
- polars==0.20.16
Screenshots
If applicable, add screenshots to help explain your problem.
Context
Line 390 of ..\pandera\backends\polars\container.py:
if not (
schema.coerce or any(col.coerce for col in schema.columns.values())
):
Coercion handling is the same if any column has coerce=True.
Hi @evanrasmussen9 are you able to make a PR to fix that line of code?
just kidding, here's a PR fix: https://github.com/unionai-oss/pandera/pull/1612 🙂
Hi Niels,
I was planning to open a PR for the issue but was traveling the last couple days. I appreciate you getting the fix up so quickly!
From: Niels Bantilan @.> Sent: Friday, May 3, 2024 10:14 PM To: unionai-oss/pandera @.> Cc: Evan Rasmussen @.>; Mention @.> Subject: Re: [unionai-oss/pandera] Incorrect Pandera Polars DataFrameModel Type Coercion Logic (Issue #1610)
just kidding, here's a PR fix: #1612https://github.com/unionai-oss/pandera/pull/1612
— Reply to this email directly, view it on GitHubhttps://github.com/unionai-oss/pandera/issues/1610#issuecomment-2094000165, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFGLHC2VJSAFAMNO5LOH5F3ZAROA5AVCNFSM6AAAAABHF3WD7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGAYDAMJWGU. You are receiving this because you were mentioned.Message ID: @.***>
no worries, thanks for catching all these bugs!
I'm planning on creating the stable 0.19.0 release in the next few days, I'm sure there's more to fix, but I think it's almost ready for prime time
Awesome! Yeah, happy to help. Planning to be using Pandera Polars engine heavily in foreseeable future so I’ll be sure to help out / bring up anything I may find.
From: Niels Bantilan @.> Sent: Saturday, May 4, 2024 8:40 PM To: unionai-oss/pandera @.> Cc: Evan Rasmussen @.>; Mention @.> Subject: Re: [unionai-oss/pandera] Incorrect Pandera Polars DataFrameModel Type Coercion Logic (Issue #1610)
no worries, thanks for catching all these bugs!
I'm planning on creating the stable 0.19.0 release in the next few days, I'm sure there's more to fix, but I think it's almost ready for prime time
— Reply to this email directly, view it on GitHubhttps://github.com/unionai-oss/pandera/issues/1610#issuecomment-2094558653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFGLHCZQ4KSNGZL7YHCO2ZTZAWLWLAVCNFSM6AAAAABHF3WD7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGU2TQNRVGM. You are receiving this because you were mentioned.Message ID: @.***>