iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Format-versioned `Snapshot`s in light of V3 additions

Open smaheshwar-pltr opened this issue 7 months ago • 5 comments

Feature Request / Improvement

While thinking about https://github.com/apache/iceberg-python/issues/1971 and https://github.com/apache/iceberg-python/issues/1972, I realised that V3 introduces new fields to Snapshot - one required for V3 and the other not.

As it stands, it feels inelegant to add the V3 required field as an optional field on the Snapshot class and e.g. check within TableMetadata construction that it's present if the table is V3 (or just not do this at all). I think it might be nicer to encode that information within the typing (model), similar to the TableMetadataV3 excerpt below. https://github.com/apache/iceberg-python/blob/201057e4152e6223717b9ab77d8acea3438e26c1/pyiceberg/table/metadata.py#L552-L560

I'm therefore wondering about about "versioning" Snapshot similar to TableMetadata, so that V3 TableMetadata would contain a list of V3 Snapshots. Then, if V3 snapshot fields are present in V2 metadata, we'd get the benefit of throwing which I think is nice about PyIceberg's TableMetadata Union setup here compared to other implementations.

(I've not fleshed out the details here so not certain this is feasible but dropping an issue for now. Perhaps this has already been discussed / thought about 😄)

smaheshwar-pltr avatar May 06 '25 21:05 smaheshwar-pltr

Hey @smaheshwar-pltr Thanks for bringing this up.

I'm therefore wondering about "versioning" Snapshot similar to TableMetadata, so that V3 TableMetadata would contain a list of V3 Snapshots.

The problem is that from the moment we upgrade a table from {V1,V2} to V3, the field is not there, so we still would run into deserialization issues. For simplicity, I'm leaning towards not versioning because we still would need to check if the fields are not-null, as they stay null after bumping the version to V3: https://iceberg.apache.org/spec/#row-lineage-for-upgraded-tables

Fokko avatar May 06 '25 22:05 Fokko

Ooh thanks a lot for pointing that out @Fokko, I think the upgrade procedure would indeed make versioning complicated. Siding with you now

smaheshwar-pltr avatar May 06 '25 22:05 smaheshwar-pltr

@smaheshwar-pltr Are you interested in adding those fields?

Fokko avatar May 08 '25 07:05 Fokko

@smaheshwar-pltr Are you interested in adding those fields?

Happy for someone else to take a stab!

smaheshwar-pltr avatar May 08 '25 19:05 smaheshwar-pltr

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Nov 12 '25 00:11 github-actions[bot]