velox Support reading Iceberg split with equality deletes

This PR introduces EqualityDeleteFileReader, which is used to read Iceberg splits with equality delete files.

Sep 25 '24 05:09 yingsu00

Deploy Preview for meta-velox canceled.

Name	Link
Latest commit	8b0ed7ff524718d4b5e96c83142ddaef1253a6fa
Latest deploy log	https://app.netlify.com/projects/meta-velox/deploys/68884d67fb888500089e9bf9

Sep 25 '24 05:09 netlify[bot]

@Yuhta This PR replaces https://github.com/facebookincubator/velox/pull/9063 and is ready for review again. As you suggested, we now update the ScanSpec to insert/remove new columns into it instead of cloning the whole ScanSpec. Thanks a lot for reviewing!

Oct 11 '24 22:10 yingsu00

@yingsu00 can you please rebase again? There is a conflict with other changes.

Oct 24 '24 23:10 czentgr

@Yuhta Just rebased, appreciate your review again. Thanks!

Nov 13 '24 02:11 yingsu00

cc: @liujiayi771 Would you like to take a review? Thanks.

Nov 22 '24 06:11 rui-mo

cc: @liujiayi771 Would you like to take a review? Thanks.

@liujiayi771 Can you take a look of the PR if possible? should we add something in Gluten side after this PR merged? It's requested by a Gluten customer.

Nov 26 '24 00:11 FelixYBW

@FelixYBW Yes, Gluten needs to make some minor changes to accommodate this PR. However, Spark cannot produce equality delete files. We need to use Flink to generate Iceberg tables with equality delete files for testing. I will perform some test this week.

Nov 26 '24 00:11 liujiayi771

@nmahadevuni can you help resolve the conflicts with main and rebase? Thanks.

Jan 27 '25 21:01 majetideepak

This PR also has a test failure in HashJoin.dynamicFilters that goes away without these changes. It is easily reproducible locally and can be seen failing in the CI pipeline.

[ RUN      ] HashJoinTest.dynamicFilters
I0211 11:45:19.812211 1319372 Compression.cpp:629] Initialized zstd compressor with compression level 7
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:4682: Failure
Expected: (getInputPositions(task, 1)) < (numRowsProbe * numSplits), actual: 3330 vs 3330
Google Test trace:
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:4669: hasSpill:false
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:665: Without Spill
/root/lakehouse/velox/velox/exec/tests/utils/QueryAssertions.cpp:1182: Failure
Failed
Expected 1000, got 3330
2330 extra rows, 0 missing rows
10 of extra rows:
        1
        1
        1
        1
        1
        1
        1
        1
        1
        1

0 of missing rows:

Note: DuckDB only supports timestamps of millisecond precision. If this test involves timestamp inputs, please make sure you use the right precision.
DuckDB query: SELECT t.c1 + 1 FROM t, u WHERE t.c0 = u.c0 AND t.c0 < 500
Google Test trace:
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:665: Without Spill
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:4805: Failure
Expected: (getInputPositions(task, 1)) < (numRowsProbe * numSplits), actual: 3330 vs 3330
Google Test trace:
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:4792: hasSpill:false
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:665: Without Spill
/root/lakehouse/velox/velox/exec/tests/utils/QueryAssertions.cpp:1182: Failure
Failed
Expected 830, got 2450
1620 extra rows, 0 missing rows
10 of extra rows:
        1
        1
        1
        1
        1
        1
        1
        1
        1
        1

0 of missing rows:

Note: DuckDB only supports timestamps of millisecond precision. If this test involves timestamp inputs, please make sure you use the right precision.
DuckDB query: SELECT t.c1 + 1 FROM t, u WHERE t.c0 = u.c0 AND t.c0 < 200
Google Test trace:
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:665: Without Spill
/root/lakehouse/velox/velox/exec/tests/utils/QueryAssertions.cpp:1182: Failure
Failed
Expected 830, got 2450
1620 extra rows, 0 missing rows
10 of extra rows:
        1
        1
        1
        1
        1
        1
        1
        1
        1
        1

0 of missing rows:

Note: DuckDB only supports timestamps of millisecond precision. If this test involves timestamp inputs, please make sure you use the right precision.
DuckDB query: SELECT t.c1 + 1 FROM t WHERE t.c0 IN (SELECT c0 FROM u) AND t.c0 < 200
Google Test trace:
/root/lakehouse/velox/velox/exec/tests/HashJoinTest.cpp:665: Without Spill
[  FAILED  ] HashJoinTest.dynamicFilters (4442 ms)
[----------] 1 test from HashJoinTest (4443 ms total)

Feb 11 '25 21:02 czentgr

Thanks @czentgr. I have fixed the issue.

Feb 12 '25 06:02 nmahadevuni

@Yuhta @rui-mo Will you be able to review again? Thanks!

Feb 28 '25 04:02 yingsu00

@Yuhta @rui-mo We've updated this PR, added a separate commit with a fix for the nested subfield filter, and a new test to show the necessity of recursively adding/removing the filter on the nested column in ScanSpec. Would appreciate your review again! Thank you.

Mar 12 '25 21:03 yingsu00

Please describe the restriction in PR description, e.g. not support hugeint, float, spark neq, timestamp

Mar 13 '25 17:03 jinchengchenghh

If the user queries like select id, data, _deleted from table, how can we return the metadata column _deleted, so as _pos?

Mar 14 '25 14:03 jinchengchenghh

Please describe the restriction in PR description, e.g. not support hugeint, float, spark neq, timestamp

If the user queries like select id, data, _deleted from table, how can we return the metadata column _deleted, so as _pos?

Currently Presto Java and C++ only supports $path and $data_sequence_number metadata columns. See https://prestodb.io/docs/current/connector/iceberg.html#extra-hidden-metadata-columns. We will be adding more in the future. See https://github.com/prestodb/presto/issues/24733

Mar 16 '25 02:03 yingsu00

@rui-mo @Yuhta The PR is updated, could you please review again? Thank you!

Mar 18 '25 23:03 yingsu00

@ethanyzhang imported this issue into IBM GitHub Enterprise

Apr 04 '25 07:04 prestodb-ci

@yingsu00 Do you think all the review comments has been resolved so far?

Aug 01 '25 15:08 PingLiuPing

@yingsu00 Do you think all the review comments has been resolved so far?

Yes we're waiting for Meta's feedback. We also added new test TestSubFieldEqualityDelete to show why we needed to tag internal (not top level) nodes in ScanSpec. We've been trying to resolve conflicts frequently but they are too frequent and took too much time. If this PR can get reviewed soon we'll rebase again.

Sep 10 '25 20:09 yingsu00

@yingsu00 Do we still want to proceed with this change or we just start a new IcebergDataSource directly inheriting the abstract DataSource without going through HiveDataSource?

Sep 19 '25 22:09 Yuhta

@yingsu00 Do we still want to proceed with this change or we just start a new IcebergDataSource directly inheriting the abstract DataSource without going through HiveDataSource?

@Yuhta Hi Jimmy thanks for reviewing again! We are compiling the new IcebergConnector PR and it will be out in a couple of weeks. I think we can wait for that one is merged first, then we will re-submit this PR to be in the new IcebergConnector. Thank you in advance for reviewing the upcoming PRs! Your review is very much appreciated.

Oct 01 '25 02:10 yingsu00

velox velox copied to clipboard

Support reading Iceberg split with equality deletes

✅ Deploy Preview for meta-velox canceled.

velox
velox copied to clipboard

Deploy Preview for meta-velox canceled.