iceberg-python Support IsolationLevels and Concurrency Safety Validation Checks

Feature Request / Improvement

Support enforcing Isolation Levels from specified snapshot ID

https://iceberg.apache.org/docs/latest/spark-configuration/#write-options

There's been a lot of continued interest in using multiple PyIceberg applications concurrently and having proper support for optimistic concurrency.

I think the best place to start is through the implementation of the individual validation functions

Once this is complete, we'll be able to introduce the Isolation Levels and correctly implement the validation logic in the _OverwriteFiles snapshot producer, similarly to the Java implementation

Jun 14 '24 18:06 sungwy

Hi I am interested in working on it!

Jun 14 '24 20:06 jqin61

Some relevant links to the Java implementation

validateNewDataFiles flag -> MergingSnapshotProducer: validateAddedDataFiles
validateNewDeletes flag -> MergingSnapshotProducer: validateDeletedDataFiles

Jun 14 '24 20:06 sungwy

Hey @sungwy I would like to contribute by working on these.

Is there any of these that I can pick and starts looking into it like any of the initial validation implementation ?

Apr 18 '25 08:04 guptaakashdeep

@guptaakashdeep yes, I don't think there's a particular order we should implement these with, so please feel free to assign yourself to the one you find most interesting!

Sung

Apr 18 '25 12:04 sungwy

Thanks @sungwy ! Do we have any already existing class where I can implement these Validation functions or should we just add directly in snapshot.py ?

Apr 18 '25 13:04 guptaakashdeep

I think we could create a new module as pyiceberg.table.update.validate.py and add these validation checks there. What do you think @guptaakashdeep ?

Apr 18 '25 13:04 sungwy

Sounds good @sungwy !!

Apr 18 '25 15:04 guptaakashdeep

@guptaakashdeep @sungwy see https://github.com/apache/iceberg-python/pull/1935 which should be the building blocks needed to crank out the 4 Sub-issues

Apr 19 '25 01:04 jayceslesar

Also going to crank out a manifest group implementation today

Edit: @sungwy it looks like the manifestgroup.entries method is extremely similar to the DataScan defined in Table __init__.py file...What do you think?

Apr 19 '25 15:04 jayceslesar

Is there any update on this feature?

Aug 18 '25 07:08 cnatsis

Also curious if there is any movement on this PR. I currently have some workarounds for concurrent writes implemented but they are very inefficient.

Nov 07 '25 19:11 dyami0123