Feature request: manifest file can track deletion vector
Is your feature request related to a problem or challenge?
Hi team, this feature request is half a question on puffin / deletion vector progress, and half on feature request for manifest support.
As stated in the spec:
Delete manifests track deletion vectors individually by the containing file location (file_path), starting offset of the DV blob (content_offset), and total length of the blob (content_size_in_bytes). Multiple deletion vectors can be stored in the same file. There are no restrictions on the data files that can be referenced by deletion vectors in the same Puffin file.
My understanding is, in the manifest file, apart from data file tracking, there're records for puffin files, example:
{
"snapshot_id": 4439194908709239593,
"sequence_number": null,
"file_sequence_number": null,
"data_file": {
"content": 0,
"file_path": "file:///tmp/iceberg-test/default/test_table/data/iceberg-data-00000.parquet",
"file_format": "PARQUET",
...,
},
"puffin_file": {
"file_path": "file:///tmp/dir/puffin.bin",
"file_format": "PUFFIN",
"content": DELETION_VECTOR_TYPE,
"content_offset": ...,
"content_size_in_bytes": ...,
}
}
I'm aware there's an epic about puffin progress, but I don't see any change on manifest side in the PRs.
Curious am I mis-understanding for the spec, is it already implemented but I'm not aware of, or we have plans to implement that in the future?
Thank you!
Describe the solution you'd like
No response
Willingness to contribute
None
Thanks @dentiny for raising this. It's only about puffin format reader/writer. Statistics and deletion vector are not supported yet.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'