iceberg
iceberg copied to clipboard
zorder does not work with sub fields
Apache Iceberg version
1.4.2
Query engine
Spark
Please describe the bug 🐞
The rewrite_data_files with zorder does not work on sub-fields.
CALL users.system.rewrite_data_files(
table => 'users.jcc.flow',
options => map('max-concurrent-file-group-rewrites', '20',
'partial-progress.enabled', 'true',
'rewrite-all', 'true'),
strategy => 'sort',
sort_order => 'zorder(SRC_IP.v4, DST_IP.v4)',
where => "END_TIME >= TIMESTAMP '2024-02-12'
AND END_TIME < TIMESTAMP '2024-02-12' + INTERVAL 1 DAY"
)
I get the error java.lang.IllegalArgumentException: SRC_IP.v4 does not exist
schema of the table is SRC_IP: struct<v4:bigint,v6:binary>, DST_IP: struct<v4:bigint,v6:binary>
can this fit for your use case : https://github.com/apache/iceberg/pull/9818/files
Seems like it would. I'm not a reviewer but I do want to the fix :-)
The issue with nested fields for zorder still exists. Any chance you have time to complete the PR?
@singhpk234 just following up on the lack of support for nested fields when applying zordering
@RussellSpitzer do you think the PR mentioned by @singhpk234 https://github.com/apache/iceberg/pull/9818/files could get merged ?
I also need this. Tried to work around this by setting the default sort order of the table, but the table spec doesn't support zordering.
This is still an issue for us as well. The rewrite_data_files procedures says it cannot find the column we specify in our zorder. It only accepts root columns.
We should re-open that PR. I think no one got around to reviewing it before and it just got auto-closed
@RussellSpitzer We have created a fork of Iceberg with the PR from @singhpk234 and it works great.
For over a year now, we have been working around the issue by copying our nested columns to the root of the table. We would love for this PR to make it into the code base so we can stop doing that and also not have a custom fork.
Any chance this PR gets looked at and approved. Thanks
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'