GH-31387: [C++] Check nullability when validating fields on batches or struct arrays
Rationale for this change
Ensures schema validation catches null values in non-nullable fields, preventing silent errors when writing to formats like Parquet.
What changes are included in this PR?
Fixes: #31387
- Nullability checks were added in ValidateFull() for arrays, struct arrays, union arrays, and record batches.
- Introduced new validation logic in validate.cc to recursively check for nulls in non-nullable fields.
- Added unit tests in:
- array_test.cc
- array_struct_test.cc
- array_union_test.cc
- record_batch_test.cc
These tests ensure that ValidateFull() fails when nulls are present in non-nullable fields.
Are these changes tested?
Yes, new unit tests have been added
Are there any user-facing changes?
Yes, Users who try to validate or write Arrow data with nulls in non-nullable fields will now receive an explicit validation error.
- GitHub Issue: #31387
Thanks for opening a pull request!
If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.
Then could you also rename the pull request title in the following format?
GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
or
MINOR: [${COMPONENT}] ${SUMMARY}
See also:
:warning: GitHub issue #31387 has been automatically assigned in GitHub to PR creator.
cc: @pitrou @lidavidm