iceberg Core, Spark: Fix delete with filter on nested columns

Fixes #7065.

This fixes Spark delete data when using a filter on nested columns. Now such operations will fail because Spark calls canDeleteUsingMetadata which uses StrictMetricsEvaluator to evaluate whether a file should be completely deleted, however StrictMetricsEvaluator doesn't support evaluate on nested columns now, and a NPE will be thrown out, see #7065.

This updates StrictMetricsEvaluator to support evaluation on nested columns(only for columns nested in a chain of Struct fileds, will return ROWS_MIGHT_NOT_MATCH if columns are nested in Map or List fields), which solve this problem.

Mar 17 '23 15:03 zhongyujiang

@aokolnychyi @rdblue can you help review this?

Mar 17 '23 16:03 zhongyujiang

PTAL @rdblue @RussellSpitzer @aokolnychyi @szehon-ho

Dec 28 '23 02:12 bluzy

would love to see it merged

Jul 07 '24 12:07 eshishki

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

Aug 28 '24 00:08 github-actions[bot]

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sep 05 '24 00:09 github-actions[bot]

This issue is still around in spark 3.5 and would really be a big capability to have for data that is all in structured format

Sep 23 '24 08:09 blakewhatley82

Agreed. Can this be revived, @szehon-ho? Are you able to re-open it, @zhongyujiang?

Oct 03 '24 03:10 mdub

@blakewhatley82 @mdub I think this fix is incorrect because the null count data of nested columns in metadata might be incorrect for now, see #8611. I am not able to reopen this, I've created a new PR #11261 with a different approach to address this issue.

Oct 05 '24 08:10 zhongyujiang

Fixed by #11261.

Oct 14 '24 07:10 zhongyujiang