fix : correct nullability propagation for spark.bitwise_not
Which issue does this PR close?
- Closes #19150.
Rationale for this change
The Spark bitwise_not UDF always appeared as nullable in logical plans, even when its input column was non-nullable.
This happened because the UDF implemented only return_type(), which returns a DataType but does not propagate nullability.
DataFusion requires UDFs to implement return_field_from_args() when nullability depends on input fields.
As a result:
bitwise_not(non_nullable_col)incorrectly produced a nullable output.- Downstream query planning and schema inference became inconsistent.
- This differed from both Spark semantics and Arrow kernel behavior, where nullability is preserved.
This PR corrects the nullability inference.
What changes are included in this PR?
- Implemented
return_field_from_args()for the Sparkbitwise_notUDF.- Output type = input type
- Output nullability = input nullability
- Updated
return_type()to return an error, per DataFusion API guidelines when overriding nullability. - Added unit tests verifying:
- Non-nullable input → non-nullable output
- Nullable input → nullable output
- Behavior across multiple integer types (
Int32,Int64)
- Code comments and minor cleanup.
Are these changes tested?
Yes.
This PR includes new unit tests that validate:
- correct nullability propagation
- correct output types
- consistent behavior across supported integer types
Are there any user-facing changes?
Yes, but they are behavior-correcting, not breaking:
- The
spark.bitwise_notUDF now correctly reports nullability in schemas and logical plans. - No API changes.
- No behavioral change for actual runtime values — Arrow kernels already preserved null bitmaps; only planner metadata was incorrect.
This is not considered a breaking change.
@martin-g please review .
I am trying to clear out the merge queue so I took the liberty of merging up from main and resolving a clippy issue
Thanks @shifluxxc @rluvaton and @martin-g