iceberg-rust
iceberg-rust copied to clipboard
feat: support position delete writer
Complete #340
There are two kinds of writers in iceberg:
- Plain position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
- Sorting position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
It seems that this pr tries to implement 2, while there are some missing part there. I would suggest to implement 1 first as it's easier, what do you think?
Is position delete must be sorted or it just be optional? From the iceberg spec, it looks like it must be sorted. https://iceberg.apache.org/spec/#position-delete-files:~:text=The%20rows%20in%20the%20delete%20file%20must%20be%20sorted%20by%20file_path%20then%20pos%20to%20optimize%20filtering%20rows%20while%20scanning.
There are two kinds of writers in iceberg:
- Plain position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
- Sorting position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
It seems that this pr tries to implement 2, while there are some missing part there. I would suggest to implement 1 first as it's easier, what do you think?
Is position delete must be sorted or it just be optional? From the iceberg spec, it looks like it must be sorted. https://iceberg.apache.org/spec/#position-delete-files:~:text=The%20rows%20in%20the%20delete%20file%20must%20be%20sorted%20by%20file_path%20then%20pos%20to%20optimize%20filtering%20rows%20while%20scanning.
Yes, it's required in spec, but some compute engine could sort this before passing to writer, and writer doesn't need to handle sorting itself.
There are two kinds of writers in iceberg:
- Plain position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
- Sorting position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
It seems that this pr tries to implement 2, while there are some missing part there. I would suggest to implement 1 first as it's easier, what do you think?
Is position delete must be sorted or it just be optional? From the iceberg spec, it looks like it must be sorted. https://iceberg.apache.org/spec/#position-delete-files:~:text=The%20rows%20in%20the%20delete%20file%20must%20be%20sorted%20by%20file_path%20then%20pos%20to%20optimize%20filtering%20rows%20while%20scanning.
Yes, it's required in spec, but some compute engine could sort this before passing to writer, and writer doesn't need to handle sorting itself.
Make sense. Let's implement 1 first
I think we can resolve #741 first before this PR.
@ZENOTME Are you still working on this? I'm looking to work on one of the two writers
@ZENOTME Are you still working on this? I'm looking to work on one of the two writers
Sorry for the late, I will work on this later. Would you like to work on sorting position delete writer after this PR?
Hi @liurenjie1024 @jonathanc-n. I have fixed this PR. It's ready for review.
hi @liurenjie1024, is there other improvement for this PR? I think it's ready to go.
@liurenjie1024 @Xuanwo Any more comments? Shall we merge this PR?
@liurenjie1024 @Xuanwo Any more comments? Shall we merge this PR?
It has been a long time and I need to take a review again.