iceberg-rust
iceberg-rust copied to clipboard
Scan Delete Support Part 4: Delete File Loading; Skeleton for Processing
Extends the DeleteFileManager introduced in https://github.com/apache/iceberg-rust/pull/950 To include loading of delete files, storage and retrieval of parsed delete files from shared state, and the outline for how parsing will connect up to this new work.
Issue: https://github.com/apache/iceberg-rust/issues/630
@liurenjie1024, @Xuanwo, @Fokko - this is ready for re-review, if you could take a look that would be great!
Thanks for the review @liurenjie1024 - much appreciated. Will come back with a revised design.
Back to you @liurenjie1024 - I've made the changes around missing functionality. Still the open question of if you are ok to defer the structural / performance changes to a follow-up so that we can make more incremental progress rather than adding yet more changes into this PR.
Actually @liurenjie1024 I'll go ahead with the structural changes to split this into separate loader and filter structs and update this PR.
Actually @liurenjie1024 I'll go ahead with the structural changes to split this into separate loader and filter structs and update this PR.
Sorry for late reply. I'm fine with deferring performance improvement, but have concerns with correctness problem.
Hi, @sdd I saw you opened a series of pr for handling reading of deletions. I have a suggestion about it, instead of opening a series prs, you can have a large draft pr containing all your changes, while pick one component to open a small pr for review. This way reviewer could understand your whole design by walking through the large pr, and review carefully small pr. Also, when reviewer have comments, you only need to change one large pr instead of several small ones, WDYT?
Thanks @sdd for this pr. There are some missing points in current design. Also I would suggest not putting too much in
DeleteFilterManager. I supposeDeleterFilterManageracting more like a delete loader, which manages the io and caching of record batch. The actual filtering part, could delegate toDeleteFilter, WDYT? I think a good reference implementation is java's DeleteFilter, see https://github.com/apache/iceberg/blob/af8e3f5a40f4f36bbe1d868146749e2341471586/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L50
I've refactored as you suggested and you're right, it is a better design.
Hi, @sdd I saw you opened a series of pr for handling reading of deletions. I have a suggestion about it, instead of opening a series prs, you can have a large draft pr containing all your changes, while pick one component to open a small pr for review. This way reviewer could understand your whole design by walking through the large pr, and review carefully small pr. Also, when reviewer have comments, you only need to change one large pr instead of several small ones, WDYT?
I Could do, if you think it's worth it - the other two remaining PRs after this one are much smaller and it feels like just as much work to merge those into a single PR and then break PRs out of that again as it does to simply keep rebasing those two PRs on top of this one.
I'll be very glad once this delete file read support is done - it's been a long, hard slog to be honest and I'm struggling to stay motivated with it, but we're not far off now, hopefully.
I had a bug in here that was causing the tests to deadlock in the follow-up PRs. I was missing a waker for my custom futures.That's been rectified now and this PR plus the two follow-ups are now ready for review once more.
@liurenjie1024, @Xuanwo - could do with a review again when you get chance! Thanks :-)
I Could do, if you think it's worth it - the other two remaining PRs after this one are much smaller and it feels like just as much work to merge those into a single PR and then break PRs out of that again as it does to simply keep rebasing those two PRs on top of this one.
It's just a minor suggestion, just do it with your favourite approach.
@liurenjie1024 back to you. I've addressed your suggestions that I think make sense to immediately change. I look forward to hearing back from you on the other points! :-)
Let's wait for a moment to merge it after 0.5.0 release
Hi @liurenjie1024 / @Xuanwo / @xxchan.
This is now ready again for review after a refactor taking into account @xxchan's great feedback. I'll be on holiday for a week after today so it would be great if you guys could take a look. Thanks!