presto icon indicating copy to clipboard operation
presto copied to clipboard

[Iceberg] Add support for $deleted and $delete_file_path metadata columns

Open xieandrew opened this issue 6 months ago • 6 comments

Description

Adds support for additional Iceberg metadata as hidden columns. The "$deleted" column shows whether each row is deleted, instead of a filtering out deleted rows. The "$delete_file_path" column shows the path of the delete file corresponding to a deleted row, or NULL if the row was not deleted.

Will need some feedback to confirm that this logic is what is intended for these columns and the naming of these columns ($is_deleted or $deleted)?

Motivation and Context

Closes #24733.

Impact

Exposes important debugging information not previously available.

Test Plan

Added tests in IcebergDistributedTestBase. Equality delete files have not been tested yet.

Contributor checklist

  • [x] Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • [x] If release notes are required, they follow the release notes guidelines.
  • [x] Adequate tests were added if applicable.
  • [ ] CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for ``$deleted`` metadata column
* Add support for ``$delete_file_path`` metadata column

xieandrew avatar Jun 10 '25 19:06 xieandrew

Thanks for the release note entry! Minor formatting nit suggestion.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for ``$deleted`` metadata column
* Add support for ``$delete_file_path`` metadata column

steveburnett avatar Jun 12 '25 17:06 steveburnett

Should documentation be added for these newly supported metadata columns, as we have others documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/iceberg.rst#extra-hidden-metadata-columns ?

steveburnett avatar Jun 12 '25 17:06 steveburnett

Thanks, I've edited the release notes with the formatting changes. And yes, I believe docs should be added for these metadata columns. I added draft documentation for this pending a decision on the naming of these columns.

xieandrew avatar Jun 12 '25 18:06 xieandrew

Thanks for the doc!

Now that the doc exists in the branch, I was able to test and verify doc links for the release note entry in the local doc build.

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for :ref:`connector/iceberg:\`\`$deleted\`\` column`.
* Add support for :ref:`connector/iceberg:\`\`$delete_file_path\`\` column`.

steveburnett avatar Jun 12 '25 18:06 steveburnett

Yeah this doesn't work for equality deletes because of IcebergEqualityDeleteAsJoin. Do you have an opinion on whether we should support these columns with the join or just disable the optimization?

xieandrew avatar Jun 13 '25 23:06 xieandrew

That sounds good, I've added the code to disable the optimization if the columns exists and verified that equality deletes now show up.

xieandrew avatar Jun 18 '25 19:06 xieandrew