iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Feature request: manifest file can track deletion vector

Open dentiny opened this issue 7 months ago • 1 comments

Is your feature request related to a problem or challenge?

Hi team, this feature request is half a question on puffin / deletion vector progress, and half on feature request for manifest support.

As stated in the spec:

Delete manifests track deletion vectors individually by the containing file location (file_path), starting offset of the DV blob (content_offset), and total length of the blob (content_size_in_bytes). Multiple deletion vectors can be stored in the same file. There are no restrictions on the data files that can be referenced by deletion vectors in the same Puffin file.

My understanding is, in the manifest file, apart from data file tracking, there're records for puffin files, example:

{
  "snapshot_id": 4439194908709239593,
  "sequence_number": null,
  "file_sequence_number": null,
  "data_file": {
    "content": 0,
    "file_path": "file:///tmp/iceberg-test/default/test_table/data/iceberg-data-00000.parquet",
    "file_format": "PARQUET",
    ...,
  },
  "puffin_file": {
    "file_path": "file:///tmp/dir/puffin.bin",
    "file_format": "PUFFIN",
    "content": DELETION_VECTOR_TYPE,
    "content_offset": ...,
    "content_size_in_bytes": ...,
  }
}

I'm aware there's an epic about puffin progress, but I don't see any change on manifest side in the PRs.

Curious am I mis-understanding for the spec, is it already implemented but I'm not aware of, or we have plans to implement that in the future?

Thank you!

Describe the solution you'd like

No response

Willingness to contribute

None

dentiny avatar Apr 28 '25 17:04 dentiny

Thanks @dentiny for raising this. It's only about puffin format reader/writer. Statistics and deletion vector are not supported yet.

liurenjie1024 avatar Apr 29 '25 06:04 liurenjie1024

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Oct 27 '25 00:10 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Nov 11 '25 00:11 github-actions[bot]