datafusion
datafusion copied to clipboard
fix: spark crc32 custom nullability
Which issue does this PR close?
- Closes #19157
Rationale for this change
The crc32 UDF was using the default return_type implementation which does not preserve nullability information Spark CRC32
- Only returns the data type (Int64)
- Doesn't consider nullability of inputs
- Would always mark output as non-nullable
What changes are included in this PR?
- Implemented
return_field_from_args: Creates a field with Int64 type and correctly propagates nullability from input fields and scalar arguments - Updated
return_type: Now returns an error directing users to use return_field_from_args instead - Added necessary imports:
Field,FieldRef, andReturnFieldArgsto support the new implementation - Added comprehensive nullability tests: Verifies that nullable inputs, non-nullable inputs, and null scalar literals are handled correctly
Are these changes tested?
- Non-nullable Binary input produces non-nullable Int64 output
- Nullable Binary input produces nullable Int64 output
- Null scalar literal (e.g., crc32(NULL)) produces nullable Int64 output
- Data type is correctly set to Int64 in all cases
Are there any user-facing changes?
This is a bug fix that corrects schema metadata only, it does not change the actual computation or introduce any breaking changes to the API.