Prevent `add_files` from adding a file that's already referenced by the Iceberg Table
Feature Request / Improvement
Currently add_files doesn't have a check to prevent adding an object that's already referenced by the Iceberg Table.
We should include these two checks to prevent bad behaviors of adding an already referenced data file as a new manifest entry.
We could do this by running the following two checks before the file addition:
- First check that the list of
file_pathsis unique - Check that all the files in the
file_pathsaren't referenced by any of the manifests in the current snapshot of the Iceberg Table.
Hey, im new to pyiceberg but would love to take a crack at this
Hi @amitgilad3 sounds great! I'll get this assigned to you. Please let me know if you'd like some pointers :)
Hey @sungwy - just created my first pr #1036 , would really appreciate your review and if you have any suggestions or if i choose the wrong place to implement my checks.