iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

zorder does not work with sub fields

Open cccs-jc opened this issue 1 year ago • 3 comments

Apache Iceberg version

1.4.2

Query engine

Spark

Please describe the bug 🐞

The rewrite_data_files with zorder does not work on sub-fields.

CALL users.system.rewrite_data_files(
        table => 'users.jcc.flow',
        options => map('max-concurrent-file-group-rewrites', '20',
                       'partial-progress.enabled', 'true',
                       'rewrite-all', 'true'),
        strategy => 'sort',
        sort_order => 'zorder(SRC_IP.v4, DST_IP.v4)',
        where => "END_TIME >= TIMESTAMP '2024-02-12'
            AND END_TIME < TIMESTAMP '2024-02-12' + INTERVAL 1 DAY"
        )

I get the error java.lang.IllegalArgumentException: SRC_IP.v4 does not exist

schema of the table is SRC_IP: struct<v4:bigint,v6:binary>, DST_IP: struct<v4:bigint,v6:binary>

cccs-jc avatar Mar 21 '24 18:03 cccs-jc

can this fit for your use case : https://github.com/apache/iceberg/pull/9818/files

singhpk234 avatar Apr 05 '24 15:04 singhpk234

Seems like it would. I'm not a reviewer but I do want to the fix :-)

cccs-jc avatar Apr 05 '24 18:04 cccs-jc

The issue with nested fields for zorder still exists. Any chance you have time to complete the PR?

cccs-jc avatar Jun 24 '24 09:06 cccs-jc

@singhpk234 just following up on the lack of support for nested fields when applying zordering

cccs-jc avatar Jul 16 '24 17:07 cccs-jc

@RussellSpitzer do you think the PR mentioned by @singhpk234 https://github.com/apache/iceberg/pull/9818/files could get merged ?

cccs-jc avatar Aug 23 '24 11:08 cccs-jc

I also need this. Tried to work around this by setting the default sort order of the table, but the table spec doesn't support zordering.

David-N-Perkins avatar Feb 07 '25 20:02 David-N-Perkins

This is still an issue for us as well. The rewrite_data_files procedures says it cannot find the column we specify in our zorder. It only accepts root columns.

cccs-jc avatar Apr 10 '25 15:04 cccs-jc

We should re-open that PR. I think no one got around to reviewing it before and it just got auto-closed

RussellSpitzer avatar Apr 10 '25 20:04 RussellSpitzer

@RussellSpitzer We have created a fork of Iceberg with the PR from @singhpk234 and it works great.

For over a year now, we have been working around the issue by copying our nested columns to the root of the table. We would love for this PR to make it into the code base so we can stop doing that and also not have a custom fork.

Any chance this PR gets looked at and approved. Thanks

cccs-jc avatar Apr 16 '25 13:04 cccs-jc

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Oct 14 '25 00:10 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Oct 28 '25 00:10 github-actions[bot]