iceberg-python
iceberg-python copied to clipboard
partitioned write support
Todo
- [x] support partitioned append()
- [x] support append with identity transform
- [x] fix scenario when arrow table schema not aligned with iceberg schema (finished by others)
- [x] add integration test for null column partitioning after issue#348 is closed
- [x] avoid sorting input arrow table when it is already sorted
- [x] support partitition field in manifest file see PartitionKey
- [x] apply transform for partitioning algorithm efficiency analysis when transform involved
- [x] support partitioned static overwrite()
- [x] overwrite entire table
- [x] overwrite with expression or filter string (specified partition)
- [x] overwrite filter validatoin (https://github.com/jqin61/iceberg-python/pull/4) as discussed in the monthly meeting, overwrite will be supported by delete + append. we will support more wild filters than spark iceberg and might rewrite files for overwriting rather than just using IsNull and EqualTo. So this is not needed.
- [x] extend summary for partitioned stats (https://github.com/apache/iceberg-python/pull/521)
- [x] support partitioned dynamic overwrite()
As discussed in the monthly community sync, this will be broken down into 4 prs of:
- Partitioned append with identity transform
- Dynamic overwrite using delete + append, 2 snapshots in one commit
- Hidden partitioning support (for slicing the arrow table, manifest file entry.partition, data file path)
- Static overwrite using delete + append, 2 snapshots in one commit