Fix boolean casting consistency with Pandas (#20746)
Description
Closes #20746.
This PR fixes an inconsistency between cuDF and Pandas when casting floating-point columns containing NaN values to boolean while mode.pandas_compatible is enabled.
The Issue:
-
Pandas:
bool(float('nan'))evaluates toTrue. Casting a Series[1.0, NaN]to bool results in[True, True]. -
cuDF: Previously,
NaNvalues in float columns were treated as nulls, which propagated as nulls after casting to bool ([True, <NA>]).
The Fix:
Updated as_numerical_column in numerical.py. When mode.pandas_compatible is on, if we detect a cast from Float -> Bool on a column with nulls, we explicitly fill the nulls with np.nan before casting. This ensures the underlying cast logic evaluates them as True, matching Pandas behavior.
Checklist
- [x] I am adding a new test (see
tests/test_issue_20746.py) - [x] I have signed off my commits
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
Hi @TomAugspurger , @rjzamora , I've resolved the above given bug. It looks like the Label Checker is failing, likely because I don't have permissions to add tags. Could you please label this PR and review when you have a chance? I've updated the branch to the latest main. Thanks!
Thanks for the PR @aryansri05. Don't worry about the labels, we can add those.
@mroeschke would you have a chance to review this? Do you have a sense for whether the original behavior handling "" differently from pandas in read_csv with boolean dtypes was intended or is relied on?
/ok to test 6d89facd5ff4d60a9a6d130d86f16eb6de9f07c1
/ok to test 4ed7ff136f5eea7fb55088ab09e823e85c760ab3
/ok to test 510b0765b31470ee8d55b0bf64071b5958e31398
is there anything left from my side to change in the pr ?
Thanks @aryansri05 I will ping again if any changes are needed on your end.
/ok to test 8173390
/ok to test 0e44d64c88b2e292507ec81a770cc10e45ccd21d
/merge
Thanks @aryansri05