cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Fix boolean casting consistency with Pandas (#20746)

Open aryansri05 opened this issue 1 month ago • 4 comments

Description

Closes #20746.

This PR fixes an inconsistency between cuDF and Pandas when casting floating-point columns containing NaN values to boolean while mode.pandas_compatible is enabled.

The Issue:

  • Pandas: bool(float('nan')) evaluates to True. Casting a Series [1.0, NaN] to bool results in [True, True].
  • cuDF: Previously, NaN values in float columns were treated as nulls, which propagated as nulls after casting to bool ([True, <NA>]).

The Fix: Updated as_numerical_column in numerical.py. When mode.pandas_compatible is on, if we detect a cast from Float -> Bool on a column with nulls, we explicitly fill the nulls with np.nan before casting. This ensures the underlying cast logic evaluates them as True, matching Pandas behavior.

Checklist

  • [x] I am adding a new test (see tests/test_issue_20746.py)
  • [x] I have signed off my commits

aryansri05 avatar Dec 01 '25 14:12 aryansri05

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Dec 01 '25 14:12 copy-pr-bot[bot]

Hi @TomAugspurger , @rjzamora , I've resolved the above given bug. It looks like the Label Checker is failing, likely because I don't have permissions to add tags. Could you please label this PR and review when you have a chance? I've updated the branch to the latest main. Thanks!

aryansri05 avatar Dec 06 '25 06:12 aryansri05

Thanks for the PR @aryansri05. Don't worry about the labels, we can add those.

@mroeschke would you have a chance to review this? Do you have a sense for whether the original behavior handling "" differently from pandas in read_csv with boolean dtypes was intended or is relied on?

TomAugspurger avatar Dec 08 '25 12:12 TomAugspurger

/ok to test 6d89facd5ff4d60a9a6d130d86f16eb6de9f07c1

mroeschke avatar Dec 09 '25 21:12 mroeschke

/ok to test 4ed7ff136f5eea7fb55088ab09e823e85c760ab3

mroeschke avatar Dec 10 '25 17:12 mroeschke

/ok to test 510b0765b31470ee8d55b0bf64071b5958e31398

mroeschke avatar Dec 11 '25 17:12 mroeschke

is there anything left from my side to change in the pr ?

aryansri05 avatar Dec 11 '25 17:12 aryansri05

Thanks @aryansri05 I will ping again if any changes are needed on your end.

mroeschke avatar Dec 11 '25 17:12 mroeschke

/ok to test 8173390

mroeschke avatar Dec 11 '25 23:12 mroeschke

/ok to test 0e44d64c88b2e292507ec81a770cc10e45ccd21d

mroeschke avatar Dec 12 '25 19:12 mroeschke

/merge

mroeschke avatar Dec 15 '25 21:12 mroeschke

Thanks @aryansri05

mroeschke avatar Dec 15 '25 21:12 mroeschke