iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Incremental Append Scan

Open hililiwei opened this issue 1 year ago • 6 comments

Hi,@Fokko, long time no see. 😄 I have written some preliminary code for incremental reading, but it still needs a lot of work. However, I would like to discuss it with you at an early stage as it will help me stay on the right track. Could you please take a look at it when you have a chance? Thank you.

hililiwei avatar Mar 19 '24 11:03 hililiwei

In the latest code commit, I tinkered with the class inheritance by introducing a new base class, BaseIncrementalScan, which inherits from TableScan. I also pushed the snapshot_id down to DataScan and shuffled a few methods around (which might cause some backward compatibility issues 💔 ). How do you think I can improve it? @Fokko

hililiwei avatar Mar 22 '24 08:03 hililiwei

Sorry for the late correction. I've adjusted the code based on the latest comments. Could you please take a look?

hililiwei avatar Apr 30 '24 10:04 hililiwei

@hililiwei I'm sorry, this also fell off my radar.

Fokko avatar May 23 '24 09:05 Fokko

I managed to get a poor mans append-scan with this https://github.com/apache/iceberg-python/issues/240#issuecomment-2248323987

Looking at this PR wouldn't it be simpler to implement append-scan in the api by adding a append_scan method to Table, then refactoring plan_files to take an optional snapshot_id, and providing a lightweight AppendScan class that makes 2 calls to plan_files and then compares?

In my case there was no need for touching __eq__ or __hash__

glesperance avatar Jul 24 '24 15:07 glesperance