pandera icon indicating copy to clipboard operation
pandera copied to clipboard

Incorrect Pandera Polars DataFrameModel Type Coercion Logic

Open evanrasmussen9 opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug A clear and concise description of what the bug is. When defining a Polars DataFrameModel, setting coerce=True for an individual column will result in the entire DataFrame being coerced.

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of pandera.
  • [X] (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandera.polars as pa
import polars as pl
class MyModelSchema(pa.DataFrameModel):
    IntegerColumn: int = pa.Field(coerce=True)
    FloatColumn: float = pa.Field(coerce=False)
    class Config:
        coerce = False

if __name__ == "__main__":
    data = {"IntegerColumn": [1.5, 2.2, 3.1], "FloatColumn": ["1.0", "2.8", "3"]}
    validated_df = MyModelSchema.validate(df)

Expected behavior

I set coerce=True for the IntegerColumn, and explicitly False in the Config class as well as the FloatColumn. I pass string values into the "FloatColumn" data, expecting to get a validation error when validating the data against the DataFrameModel since these values aren't float types. However, the string values of "FloatColumn" do get coerced into floats and the resulting validated_df has the FloatColumn as type f64.

Desktop (please complete the following information):

  • OS: Windows 11 Pro
  • Version: pandera==0.19.0b0
  • Python==3.11.5
  • polars==0.20.16

Screenshots

If applicable, add screenshots to help explain your problem. pandera_polars_coercion_bug

Context

Line 390 of ..\pandera\backends\polars\container.py:

if not (
            schema.coerce or any(col.coerce for col in schema.columns.values())
        ):

Coercion handling is the same if any column has coerce=True.

evanrasmussen9 avatar May 03 '24 17:05 evanrasmussen9

Hi @evanrasmussen9 are you able to make a PR to fix that line of code?

cosmicBboy avatar May 03 '24 18:05 cosmicBboy

just kidding, here's a PR fix: https://github.com/unionai-oss/pandera/pull/1612 🙂

cosmicBboy avatar May 04 '24 04:05 cosmicBboy

Hi Niels,

I was planning to open a PR for the issue but was traveling the last couple days. I appreciate you getting the fix up so quickly!

From: Niels Bantilan @.> Sent: Friday, May 3, 2024 10:14 PM To: unionai-oss/pandera @.> Cc: Evan Rasmussen @.>; Mention @.> Subject: Re: [unionai-oss/pandera] Incorrect Pandera Polars DataFrameModel Type Coercion Logic (Issue #1610)

just kidding, here's a PR fix: #1612https://github.com/unionai-oss/pandera/pull/1612

— Reply to this email directly, view it on GitHubhttps://github.com/unionai-oss/pandera/issues/1610#issuecomment-2094000165, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFGLHC2VJSAFAMNO5LOH5F3ZAROA5AVCNFSM6AAAAABHF3WD7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGAYDAMJWGU. You are receiving this because you were mentioned.Message ID: @.***>

evanrasmussen9 avatar May 05 '24 01:05 evanrasmussen9

no worries, thanks for catching all these bugs!

I'm planning on creating the stable 0.19.0 release in the next few days, I'm sure there's more to fix, but I think it's almost ready for prime time

cosmicBboy avatar May 05 '24 02:05 cosmicBboy

Awesome! Yeah, happy to help. Planning to be using Pandera Polars engine heavily in foreseeable future so I’ll be sure to help out / bring up anything I may find.

From: Niels Bantilan @.> Sent: Saturday, May 4, 2024 8:40 PM To: unionai-oss/pandera @.> Cc: Evan Rasmussen @.>; Mention @.> Subject: Re: [unionai-oss/pandera] Incorrect Pandera Polars DataFrameModel Type Coercion Logic (Issue #1610)

no worries, thanks for catching all these bugs!

I'm planning on creating the stable 0.19.0 release in the next few days, I'm sure there's more to fix, but I think it's almost ready for prime time

— Reply to this email directly, view it on GitHubhttps://github.com/unionai-oss/pandera/issues/1610#issuecomment-2094558653, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFGLHCZQ4KSNGZL7YHCO2ZTZAWLWLAVCNFSM6AAAAABHF3WD7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGU2TQNRVGM. You are receiving this because you were mentioned.Message ID: @.***>

evanrasmussen9 avatar May 05 '24 02:05 evanrasmussen9