arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-31387: [C++] Check nullability when validating fields on batches or struct arrays

Open singh1203 opened this issue 8 months ago • 3 comments

Rationale for this change

Ensures schema validation catches null values in non-nullable fields, preventing silent errors when writing to formats like Parquet.

What changes are included in this PR?

Fixes: #31387

  • Nullability checks were added in ValidateFull() for arrays, struct arrays, union arrays, and record batches.
  • Introduced new validation logic in validate.cc to recursively check for nulls in non-nullable fields.
  • Added unit tests in:
    • array_test.cc
    • array_struct_test.cc
    • array_union_test.cc
    • record_batch_test.cc

These tests ensure that ValidateFull() fails when nulls are present in non-nullable fields.

Are these changes tested?

Yes, new unit tests have been added

Are there any user-facing changes?

Yes, Users who try to validate or write Arrow data with nulls in non-nullable fields will now receive an explicit validation error.

  • GitHub Issue: #31387

singh1203 avatar Apr 14 '25 11:04 singh1203

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

github-actions[bot] avatar Apr 14 '25 11:04 github-actions[bot]

:warning: GitHub issue #31387 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Apr 14 '25 11:04 github-actions[bot]

cc: @pitrou @lidavidm

singh1203 avatar Apr 14 '25 11:04 singh1203