datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

fix : correct nullability propagation for spark.bitwise_not

Open shifluxxc opened this issue 1 month ago • 1 comments

Which issue does this PR close?

  • Closes #19150.

Rationale for this change

The Spark bitwise_not UDF always appeared as nullable in logical plans, even when its input column was non-nullable.

This happened because the UDF implemented only return_type(), which returns a DataType but does not propagate nullability.
DataFusion requires UDFs to implement return_field_from_args() when nullability depends on input fields.

As a result:

  • bitwise_not(non_nullable_col) incorrectly produced a nullable output.
  • Downstream query planning and schema inference became inconsistent.
  • This differed from both Spark semantics and Arrow kernel behavior, where nullability is preserved.

This PR corrects the nullability inference.

What changes are included in this PR?

  • Implemented return_field_from_args() for the Spark bitwise_not UDF.
    • Output type = input type
    • Output nullability = input nullability
  • Updated return_type() to return an error, per DataFusion API guidelines when overriding nullability.
  • Added unit tests verifying:
    • Non-nullable input → non-nullable output
    • Nullable input → nullable output
    • Behavior across multiple integer types (Int32, Int64)
  • Code comments and minor cleanup.

Are these changes tested?

Yes.

This PR includes new unit tests that validate:

  • correct nullability propagation
  • correct output types
  • consistent behavior across supported integer types

Are there any user-facing changes?

Yes, but they are behavior-correcting, not breaking:

  • The spark.bitwise_not UDF now correctly reports nullability in schemas and logical plans.
  • No API changes.
  • No behavioral change for actual runtime values — Arrow kernels already preserved null bitmaps; only planner metadata was incorrect.

This is not considered a breaking change.

shifluxxc avatar Dec 09 '25 07:12 shifluxxc

@martin-g please review .

shifluxxc avatar Dec 11 '25 10:12 shifluxxc

I am trying to clear out the merge queue so I took the liberty of merging up from main and resolving a clippy issue

alamb avatar Dec 11 '25 22:12 alamb

Thanks @shifluxxc @rluvaton and @martin-g

alamb avatar Dec 12 '25 18:12 alamb