datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

fix: spark crc32 custom nullability

Open watanaberin opened this issue 4 weeks ago • 0 comments

Which issue does this PR close?

  • Closes #19157

Rationale for this change

The crc32 UDF was using the default return_type implementation which does not preserve nullability information Spark CRC32

  • Only returns the data type (Int64)
  • Doesn't consider nullability of inputs
  • Would always mark output as non-nullable

What changes are included in this PR?

  • Implemented return_field_from_args: Creates a field with Int64 type and correctly propagates nullability from input fields and scalar arguments
  • Updated return_type: Now returns an error directing users to use return_field_from_args instead
  • Added necessary imports: Field, FieldRef, and ReturnFieldArgs to support the new implementation
  • Added comprehensive nullability tests: Verifies that nullable inputs, non-nullable inputs, and null scalar literals are handled correctly

Are these changes tested?

  • Non-nullable Binary input produces non-nullable Int64 output
  • Nullable Binary input produces nullable Int64 output
  • Null scalar literal (e.g., crc32(NULL)) produces nullable Int64 output
  • Data type is correctly set to Int64 in all cases

Are there any user-facing changes?

This is a bug fix that corrects schema metadata only, it does not change the actual computation or introduce any breaking changes to the API.

watanaberin avatar Dec 10 '25 22:12 watanaberin