iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

feat: support position delete writer

Open ZENOTME opened this issue 1 year ago • 10 comments

Complete #340

ZENOTME avatar Nov 19 '24 13:11 ZENOTME

There are two kinds of writers in iceberg:

  1. Plain position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
  2. Sorting position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49

It seems that this pr tries to implement 2, while there are some missing part there. I would suggest to implement 1 first as it's easier, what do you think?

Is position delete must be sorted or it just be optional? From the iceberg spec, it looks like it must be sorted. https://iceberg.apache.org/spec/#position-delete-files:~:text=The%20rows%20in%20the%20delete%20file%20must%20be%20sorted%20by%20file_path%20then%20pos%20to%20optimize%20filtering%20rows%20while%20scanning.

ZENOTME avatar Nov 27 '24 13:11 ZENOTME

There are two kinds of writers in iceberg:

  1. Plain position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
  2. Sorting position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49

It seems that this pr tries to implement 2, while there are some missing part there. I would suggest to implement 1 first as it's easier, what do you think?

Is position delete must be sorted or it just be optional? From the iceberg spec, it looks like it must be sorted. https://iceberg.apache.org/spec/#position-delete-files:~:text=The%20rows%20in%20the%20delete%20file%20must%20be%20sorted%20by%20file_path%20then%20pos%20to%20optimize%20filtering%20rows%20while%20scanning.

Yes, it's required in spec, but some compute engine could sort this before passing to writer, and writer doesn't need to handle sorting itself.

liurenjie1024 avatar Nov 28 '24 02:11 liurenjie1024

There are two kinds of writers in iceberg:

  1. Plain position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49
  2. Sorting position delete writer: https://github.com/apache/iceberg/blob/da2ad389fd9ba8222f6fb3f57922209c239a7045/core/src/main/java/org/apache/iceberg/deletes/PositionDeleteWriter.java#L49

It seems that this pr tries to implement 2, while there are some missing part there. I would suggest to implement 1 first as it's easier, what do you think?

Is position delete must be sorted or it just be optional? From the iceberg spec, it looks like it must be sorted. https://iceberg.apache.org/spec/#position-delete-files:~:text=The%20rows%20in%20the%20delete%20file%20must%20be%20sorted%20by%20file_path%20then%20pos%20to%20optimize%20filtering%20rows%20while%20scanning.

Yes, it's required in spec, but some compute engine could sort this before passing to writer, and writer doesn't need to handle sorting itself.

Make sense. Let's implement 1 first

ZENOTME avatar Nov 28 '24 03:11 ZENOTME

I think we can resolve #741 first before this PR.

ZENOTME avatar Dec 03 '24 02:12 ZENOTME

@ZENOTME Are you still working on this? I'm looking to work on one of the two writers

jonathanc-n avatar Feb 12 '25 08:02 jonathanc-n

@ZENOTME Are you still working on this? I'm looking to work on one of the two writers

Sorry for the late, I will work on this later. Would you like to work on sorting position delete writer after this PR?

ZENOTME avatar Feb 12 '25 12:02 ZENOTME

Hi @liurenjie1024 @jonathanc-n. I have fixed this PR. It's ready for review.

ZENOTME avatar Feb 12 '25 13:02 ZENOTME

hi @liurenjie1024, is there other improvement for this PR? I think it's ready to go.

ZENOTME avatar Mar 03 '25 08:03 ZENOTME

@liurenjie1024 @Xuanwo Any more comments? Shall we merge this PR?

manuzhang avatar Nov 02 '25 15:11 manuzhang

@liurenjie1024 @Xuanwo Any more comments? Shall we merge this PR?

It has been a long time and I need to take a review again.

liurenjie1024 avatar Nov 03 '25 10:11 liurenjie1024