trino
trino copied to clipboard
Handle sort order with nested columns on iceberg table
Description
Previously the parseSortFields from SortFieldUtils was only collecting the field id from the top level columns don't considering nested fields of nested types, so in case a query with a sorted_by property use a nested field of a nested type trino would throw an expcetion that the column don't exists, because the field id of the nested column don't exists on baseColumnFieldIds set.
This PR fix this issue by recursively collecting the field ids from table columns which the column type is a nested type.
Fix: #19620
Additional context and related issues
This is my first time contributing to trino code base, so I'm not 100% sure that this is correct, so please let me know if anything is wrong.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. (x) Release notes are required, with the following suggested text:
# Section
* Correctly handle sort order for nested columns on iceberg table ({issue}`19620`)
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Just for notice; I've already sent the cla. Maybe there is some time to sync.
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
@cla-bot check
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
The cla-bot has been summoned, and re-checked this pull request!
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla
@mattheusv This is in my list still, I am just catching up on things after vacation.
Don't hesitate with any questions.
@mattheusv This is in my list still, I am just catching up on things after vacation.
Don't hesitate with any questions.
Thanks @bitsondatadev
I'll just give some context of some changes:
As I mentioned here, the IcebergTestUtils#checkParquetFileSorting method was not handling nested columns, so I've changed the filter to find the columns to consider nested columns, so we can call the isFileSorted method like isFileSorted(Location.of((String) filePath), "row_t.name"
The problem now seems that the file is not being sorted, because the isFileSorted returns false which I don't quite understand why.
Another problem that I've notice is that if I execute ALTER TABLE iceberg.test.t execute optmize on a table sorted using nested columns it raise an exception:
trino> CREATE TABLE IF NOT EXISTS iceberg.test.t2 (
-> id INT,
-> row_t ROW(name VARCHAR)
-> ) WITH (
-> format = 'PARQUET',
-> sorted_by = ARRAY ['"row_t.name"']
-> );
CREATE TABLE
trino> insert into iceberg.test.t2(id, row_t) SELECT id, ROW(CONCAT('v', cast(id as varchar))) as row_t FROM UNNEST(sequence(1, 30)) AS t(id);
INSERT: 30 rows
trino> alter table iceberg.test.t2 execute optimize;
Query 20240701_125829_00002_62u7s, FAILED, 1 node
Splits: 12 total, 2 done (16,67%)
1,04 [21 rows, 856B] [20 rows/s, 826B/s]
Query 20240701_125829_00002_62u7s failed: Index -1 out of bounds for length 2
Server logs
2024-07-01T09:59:07.561-0300 DEBUG SplitRunner-0 io.trino.execution.executor.dedicated.SplitProcessor Index -1 out of bounds for length 2
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2
at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:79)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:5074)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:5082)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
at io.trino.operator.PagesIndex.createPagesIndexComparator(PagesIndex.java:453)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:424)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:418)
at io.trino.operator.PagesIndexPageSorter.sort(PagesIndexPageSorter.java:44)
at io.trino.plugin.hive.util.SortBuffer.flushTo(SortBuffer.java:107)
at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:148)
at io.trino.plugin.iceberg.IcebergSortingFileWriter.commit(IcebergSortingFileWriter.java:92)
at io.trino.plugin.iceberg.IcebergPageSink.closeWriter(IcebergPageSink.java:416)
at io.trino.plugin.iceberg.IcebergPageSink.finish(IcebergPageSink.java:230)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:84)
at io.trino.operator.TableWriterOperator.finish(TableWriterOperator.java:235)
at io.trino.operator.Driver.processInternal(Driver.java:421)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
at io.trino.$gen.Trino_451_8_gbf763d2____20240701_125715_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
2024-07-01T09:59:07.561-0300 DEBUG dispatcher-query-3 io.trino.execution.StageStateMachine Stage 20240701_125907_00003_62u7s.2 is PENDING
2024-07-01T09:59:07.562-0300 DEBUG stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution Pipelined stage execution 20240701_125907_00003_62u7s.2 is FINISHED
2024-07-01T09:59:07.563-0300 DEBUG Query-20240701_125907_00003_62u7s-269 io.trino.execution.scheduler.policy.PhasedExecutionSchedule scheduledStages: [PipelinedStageStateMachine{stageId=20240701_125907_00003_62u7s.2, state=FINISHED}]
2024-07-01T09:59:07.563-0300 DEBUG Query-20240701_125907_00003_62u7s-269 io.trino.execution.scheduler.policy.PhasedExecutionSchedule blockedFragments: []
2024-07-01T09:59:07.563-0300 DEBUG Query-20240701_125907_00003_62u7s-269 io.trino.execution.scheduler.policy.PhasedExecutionSchedule selectedForExecution: []
2024-07-01T09:59:07.563-0300 DEBUG Query-20240701_125907_00003_62u7s-269 io.trino.execution.scheduler.policy.PhasedExecutionSchedule scheduledStages: [PipelinedStageStateMachine{stageId=20240701_125907_00003_62u7s.1, state=SCHEDULED}, PipelinedStageStateMachine{stageId=20240701_125907_00003_62u7s.2, state=FINISHED}]
2024-07-01T09:59:07.563-0300 DEBUG Query-20240701_125907_00003_62u7s-269 io.trino.execution.scheduler.policy.PhasedExecutionSchedule blockedFragments: []
2024-07-01T09:59:07.563-0300 DEBUG Query-20240701_125907_00003_62u7s-269 io.trino.execution.scheduler.policy.PhasedExecutionSchedule selectedForExecution: []
2024-07-01T09:59:07.563-0300 DEBUG stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution Pipelined stage execution 20240701_125907_00003_62u7s.1 is SCHEDULED
2024-07-01T09:59:07.563-0300 DEBUG remote-task-callback-1 io.trino.execution.scheduler.PipelinedStageExecution Pipelined stage execution for stage 20240701_125907_00003_62u7s.1 failed
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2
at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:79)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:5074)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:5082)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
at io.trino.operator.PagesIndex.createPagesIndexComparator(PagesIndex.java:453)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:424)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:418)
at io.trino.operator.PagesIndexPageSorter.sort(PagesIndexPageSorter.java:44)
at io.trino.plugin.hive.util.SortBuffer.flushTo(SortBuffer.java:107)
at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:148)
at io.trino.plugin.iceberg.IcebergSortingFileWriter.commit(IcebergSortingFileWriter.java:92)
at io.trino.plugin.iceberg.IcebergPageSink.closeWriter(IcebergPageSink.java:416)
at io.trino.plugin.iceberg.IcebergPageSink.finish(IcebergPageSink.java:230)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:84)
at io.trino.operator.TableWriterOperator.finish(TableWriterOperator.java:235)
at io.trino.operator.Driver.processInternal(Driver.java:421)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
at io.trino.$gen.Trino_451_8_gbf763d2____20240701_125715_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
2024-07-01T09:59:07.563-0300 DEBUG stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution Pipelined stage execution 20240701_125907_00003_62u7s.1 is FAILED
2024-07-01T09:59:07.564-0300 DEBUG stage-scheduler io.trino.execution.scheduler.PipelinedQueryScheduler Failure in distributed stage for query 20240701_125907_00003_62u7s
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2
at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:79)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:5074)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:5082)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
at io.trino.operator.PagesIndex.createPagesIndexComparator(PagesIndex.java:453)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:424)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:418)
at io.trino.operator.PagesIndexPageSorter.sort(PagesIndexPageSorter.java:44)
at io.trino.plugin.hive.util.SortBuffer.flushTo(SortBuffer.java:107)
at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:148)
at io.trino.plugin.iceberg.IcebergSortingFileWriter.commit(IcebergSortingFileWriter.java:92)
at io.trino.plugin.iceberg.IcebergPageSink.closeWriter(IcebergPageSink.java:416)
at io.trino.plugin.iceberg.IcebergPageSink.finish(IcebergPageSink.java:230)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:84)
at io.trino.operator.TableWriterOperator.finish(TableWriterOperator.java:235)
at io.trino.operator.Driver.processInternal(Driver.java:421)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
at io.trino.$gen.Trino_451_8_gbf763d2____20240701_125715_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
2024-07-01T09:59:07.564-0300 DEBUG stage-scheduler io.trino.execution.StageStateMachine Stage 20240701_125907_00003_62u7s.1 failed
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2
at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:79)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:5074)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:5082)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
at io.trino.operator.PagesIndex.createPagesIndexComparator(PagesIndex.java:453)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:424)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:418)
at io.trino.operator.PagesIndexPageSorter.sort(PagesIndexPageSorter.java:44)
at io.trino.plugin.hive.util.SortBuffer.flushTo(SortBuffer.java:107)
at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:148)
at io.trino.plugin.iceberg.IcebergSortingFileWriter.commit(IcebergSortingFileWriter.java:92)
at io.trino.plugin.iceberg.IcebergPageSink.closeWriter(IcebergPageSink.java:416)
at io.trino.plugin.iceberg.IcebergPageSink.finish(IcebergPageSink.java:230)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:84)
at io.trino.operator.TableWriterOperator.finish(TableWriterOperator.java:235)
at io.trino.operator.Driver.processInternal(Driver.java:421)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
at io.trino.$gen.Trino_451_8_gbf763d2____20240701_125715_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
2024-07-01T09:59:07.564-0300 DEBUG stage-scheduler io.trino.execution.QueryStateMachine Query 20240701_125907_00003_62u7s failed
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2
at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:79)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.Collections$2.tryAdvance(Collections.java:5074)
at java.base/java.util.Collections$2.forEachRemaining(Collections.java:5082)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
at io.trino.operator.PagesIndex.createPagesIndexComparator(PagesIndex.java:453)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:424)
at io.trino.operator.PagesIndex.sort(PagesIndex.java:418)
at io.trino.operator.PagesIndexPageSorter.sort(PagesIndexPageSorter.java:44)
at io.trino.plugin.hive.util.SortBuffer.flushTo(SortBuffer.java:107)
at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:148)
at io.trino.plugin.iceberg.IcebergSortingFileWriter.commit(IcebergSortingFileWriter.java:92)
at io.trino.plugin.iceberg.IcebergPageSink.closeWriter(IcebergPageSink.java:416)
at io.trino.plugin.iceberg.IcebergPageSink.finish(IcebergPageSink.java:230)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:84)
at io.trino.operator.TableWriterOperator.finish(TableWriterOperator.java:235)
at io.trino.operator.Driver.processInternal(Driver.java:421)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
at io.trino.$gen.Trino_451_8_gbf763d2____20240701_125715_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
It seems that we need to make some other changes to make the sorted_by using nested columns works properly, and I don't know if we should change all the related code on this PR or create multiple PRs, WYT?
Another problem that I've notice is that if I execute
ALTER TABLE iceberg.test.t execute optmizeon a table sorted using nested columns it raise an exception:
I've tried to get a little deep on this error and the problem is how Trino search for columns on a table schema when performing the sorting.
The class IcebergPageSink store the index of the column that will be used to sort a file at the field sortColumnIndexes, and the way that this List of indexes is filled is not considering nested columns:
https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java#L189
So when .indexOf(column) is called with a nested column it is returned -1 and when the sort operation is performed on SortBuffer#flushTo method it raise the java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2 exception.
I don't have many ideas on how this can be fixed, since several classes used in this process use the sortColumnIndexes field to sort the file
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua