iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Scan Delete Support Part 4: Delete File Loading; Skeleton for Processing

Open sdd opened this issue 9 months ago • 6 comments

Extends the DeleteFileManager introduced in https://github.com/apache/iceberg-rust/pull/950 To include loading of delete files, storage and retrieval of parsed delete files from shared state, and the outline for how parsing will connect up to this new work.

Issue: https://github.com/apache/iceberg-rust/issues/630

sdd avatar Feb 21 '25 09:02 sdd

@liurenjie1024, @Xuanwo, @Fokko - this is ready for re-review, if you could take a look that would be great!

sdd avatar Apr 03 '25 07:04 sdd

Thanks for the review @liurenjie1024 - much appreciated. Will come back with a revised design.

sdd avatar Apr 14 '25 06:04 sdd

Back to you @liurenjie1024 - I've made the changes around missing functionality. Still the open question of if you are ok to defer the structural / performance changes to a follow-up so that we can make more incremental progress rather than adding yet more changes into this PR.

sdd avatar Apr 23 '25 21:04 sdd

Actually @liurenjie1024 I'll go ahead with the structural changes to split this into separate loader and filter structs and update this PR.

sdd avatar Apr 24 '25 07:04 sdd

Actually @liurenjie1024 I'll go ahead with the structural changes to split this into separate loader and filter structs and update this PR.

Sorry for late reply. I'm fine with deferring performance improvement, but have concerns with correctness problem.

liurenjie1024 avatar Apr 27 '25 03:04 liurenjie1024

Hi, @sdd I saw you opened a series of pr for handling reading of deletions. I have a suggestion about it, instead of opening a series prs, you can have a large draft pr containing all your changes, while pick one component to open a small pr for review. This way reviewer could understand your whole design by walking through the large pr, and review carefully small pr. Also, when reviewer have comments, you only need to change one large pr instead of several small ones, WDYT?

liurenjie1024 avatar Apr 29 '25 07:04 liurenjie1024

Thanks @sdd for this pr. There are some missing points in current design. Also I would suggest not putting too much in DeleteFilterManager. I suppose DeleterFilterManager acting more like a delete loader, which manages the io and caching of record batch. The actual filtering part, could delegate to DeleteFilter, WDYT? I think a good reference implementation is java's DeleteFilter, see https://github.com/apache/iceberg/blob/af8e3f5a40f4f36bbe1d868146749e2341471586/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L50

I've refactored as you suggested and you're right, it is a better design.

sdd avatar Apr 30 '25 19:04 sdd

Hi, @sdd I saw you opened a series of pr for handling reading of deletions. I have a suggestion about it, instead of opening a series prs, you can have a large draft pr containing all your changes, while pick one component to open a small pr for review. This way reviewer could understand your whole design by walking through the large pr, and review carefully small pr. Also, when reviewer have comments, you only need to change one large pr instead of several small ones, WDYT?

I Could do, if you think it's worth it - the other two remaining PRs after this one are much smaller and it feels like just as much work to merge those into a single PR and then break PRs out of that again as it does to simply keep rebasing those two PRs on top of this one.

I'll be very glad once this delete file read support is done - it's been a long, hard slog to be honest and I'm struggling to stay motivated with it, but we're not far off now, hopefully.

sdd avatar Apr 30 '25 19:04 sdd

I had a bug in here that was causing the tests to deadlock in the follow-up PRs. I was missing a waker for my custom futures.That's been rectified now and this PR plus the two follow-ups are now ready for review once more.

sdd avatar May 10 '25 12:05 sdd

@liurenjie1024, @Xuanwo - could do with a review again when you get chance! Thanks :-)

sdd avatar May 14 '25 06:05 sdd

I Could do, if you think it's worth it - the other two remaining PRs after this one are much smaller and it feels like just as much work to merge those into a single PR and then break PRs out of that again as it does to simply keep rebasing those two PRs on top of this one.

It's just a minor suggestion, just do it with your favourite approach.

liurenjie1024 avatar May 14 '25 10:05 liurenjie1024

@liurenjie1024 back to you. I've addressed your suggestions that I think make sense to immediately change. I look forward to hearing back from you on the other points! :-)

sdd avatar May 15 '25 06:05 sdd

Let's wait for a moment to merge it after 0.5.0 release

liurenjie1024 avatar May 16 '25 07:05 liurenjie1024

Hi @liurenjie1024 / @Xuanwo / @xxchan.

This is now ready again for review after a refactor taking into account @xxchan's great feedback. I'll be on holiday for a week after today so it would be great if you guys could take a look. Thanks!

sdd avatar May 23 '25 06:05 sdd