trino icon indicating copy to clipboard operation
trino copied to clipboard

Fix iceberg $files metadata table not show delete files

Open 0xffmeta opened this issue 2 years ago • 36 comments

Description

This PR is aimed to fix $files table not showing delete files for iceberg v2 format. https://github.com/trinodb/trino/issues/16233

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required. ( ) Release notes are required, please propose a release note for me. (x) Release notes are required, with the following suggested text:

# Section
* Fix `$files` table not showing delete files for iceberg v2 format. ({issue}`[16233]`)

0xffmeta avatar Feb 23 '23 11:02 0xffmeta

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 23 '23 11:02 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 24 '23 02:02 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 24 '23 08:02 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 24 '23 08:02 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 25 '23 05:02 cla-bot[bot]

fyi - I have already signed CLA, and I think it might be pending on the process at this moment.

0xffmeta avatar Feb 27 '23 06:02 0xffmeta

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Feb 27 '23 13:02 cla-bot[bot]

Thanks @ebyhr for the review. I have updated the PR according to the comment. Please take a look when you have some time.

0xffmeta avatar Feb 27 '23 13:02 0xffmeta

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Mar 02 '23 06:03 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Mar 03 '23 09:03 cla-bot[bot]

@krvikash Thanks for the review. I just updated the PR accordingly. Please take a look when you have some time.

0xffmeta avatar Mar 03 '23 09:03 0xffmeta

@cla-bot check

ebyhr avatar Mar 07 '23 22:03 ebyhr

The cla-bot has been summoned, and re-checked this pull request!

cla-bot[bot] avatar Mar 07 '23 22:03 cla-bot[bot]

Hi @0xffmeta, I'm adding a test to validate the $files system table output and fix the problem raised in https://github.com/trinodb/trino/issues/16473. To do so, I'll cherry-pick your 1st commit Wrap collection values in array blocks. I hope you don't mind.

krvikash avatar Mar 13 '23 10:03 krvikash

@krvikash Thanks for the reminder. No problem at all.

0xffmeta avatar Mar 13 '23 10:03 0xffmeta

@0xffmeta FYI https://github.com/trinodb/trino/pull/16519/files

krvikash avatar Mar 13 '23 12:03 krvikash

Hi @0xffmeta, https://github.com/trinodb/trino/pull/16519 is merged now. Could you please rebase and resolved the conflicts?

krvikash avatar Mar 14 '23 10:03 krvikash

@krvikash I just updated this PR. Please take a look when you have some time.

0xffmeta avatar Mar 15 '23 04:03 0xffmeta

Hi @krvikash, just want to check if this PR can be merged or not.

0xffmeta avatar Mar 24 '23 07:03 0xffmeta

Hi @krvikash, are you able to review this PR again to see if this can be merged? Thanks.

0xffmeta avatar Apr 19 '23 11:04 0xffmeta

@0xffmeta, Sorry for the late response. Overall LGTM.

@alexjo2144 Could you please take a look?

krvikash avatar Apr 26 '23 10:04 krvikash

@ebyhr @krvikash @alexjo2144 ping. is there any additional help or input needed for this fix to be merged?

vakarisbk avatar Jul 03 '23 11:07 vakarisbk

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jan 17 '24 17:01 github-actions[bot]

:wave: @0xffmeta - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers and maintainers to help.

cc @bitsondatadev @findepi @alexjo2144 @findinpath

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

mosabua avatar Jan 17 '24 19:01 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Feb 09 '24 17:02 github-actions[bot]

This is still relevant. Iceberg$files table assigns content=0 for deleted files.

alaturqua avatar Feb 12 '24 00:02 alaturqua

@alexjo2144 @krvikash This is important to fix.

One thing I've observed people trying to do is to build some way to identify whether their table needs to be optimized to remove delete files for example.

A simple logic is to see if the size of delete files is above some threshold or the count of records in delete files is above some threshold. Unfortunately Trino cannot be used for this today because $files doesn't show delete files.

hashhar avatar Mar 01 '24 09:03 hashhar

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Mar 22 '24 17:03 github-actions[bot]

Maybe @electrum @findepi or @findinpath can help out here.

Also @0xffmeta could you rebase?

mosabua avatar Mar 22 '24 18:03 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Apr 15 '24 17:04 github-actions[bot]