[Iceberg] Add support for $deleted and $delete_file_path metadata columns
Description
Adds support for additional Iceberg metadata as hidden columns. The "$deleted" column shows whether each row is deleted, instead of a filtering out deleted rows. The "$delete_file_path" column shows the path of the delete file corresponding to a deleted row, or NULL if the row was not deleted.
Will need some feedback to confirm that this logic is what is intended for these columns and the naming of these columns ($is_deleted or $deleted)?
Motivation and Context
Closes #24733.
Impact
Exposes important debugging information not previously available.
Test Plan
Added tests in IcebergDistributedTestBase. Equality delete files have not been tested yet.
Contributor checklist
- [x] Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
- [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
- [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
- [x] If release notes are required, they follow the release notes guidelines.
- [x] Adequate tests were added if applicable.
- [ ] CI passed.
Release Notes
Please follow release notes guidelines and fill in the release notes below.
== RELEASE NOTES ==
Iceberg Connector Changes
* Add support for ``$deleted`` metadata column
* Add support for ``$delete_file_path`` metadata column
Thanks for the release note entry! Minor formatting nit suggestion.
== RELEASE NOTES ==
Iceberg Connector Changes
* Add support for ``$deleted`` metadata column
* Add support for ``$delete_file_path`` metadata column
Should documentation be added for these newly supported metadata columns, as we have others documented in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/iceberg.rst#extra-hidden-metadata-columns ?
Thanks, I've edited the release notes with the formatting changes. And yes, I believe docs should be added for these metadata columns. I added draft documentation for this pending a decision on the naming of these columns.
Thanks for the doc!
Now that the doc exists in the branch, I was able to test and verify doc links for the release note entry in the local doc build.
== RELEASE NOTES ==
Iceberg Connector Changes
* Add support for :ref:`connector/iceberg:\`\`$deleted\`\` column`.
* Add support for :ref:`connector/iceberg:\`\`$delete_file_path\`\` column`.
Yeah this doesn't work for equality deletes because of IcebergEqualityDeleteAsJoin. Do you have an opinion on whether we should support these columns with the join or just disable the optimization?
That sounds good, I've added the code to disable the optimization if the columns exists and verified that equality deletes now show up.