trino icon indicating copy to clipboard operation
trino copied to clipboard

Trino worker dies while processing the query from #21443

Open sajjoseph opened this issue 4 months ago • 6 comments

I was checking on this issue - #21443 - in 476 version.

Tried the below query:

SELECT *
FROM (
	SELECT concat_ws('', repeat(concat_ws('', repeat('a', 1000)), 500))
) CROSS JOIN UNNEST(sequence(1, 5000));

The worker node that worked on the query just died - no indication in the server log about the cause of the crash.

I turned the logging in the node (io.trino=DEBUG). Following is the output:

2025-06-12T11:22:39.473Z        DEBUG   Query-20250612_112234_00000_it8bd-498   io.trino.sql.planner.LogicalPlanner     environment=production,host=worker1 io.trino.sql.planner.optimizations.BeginTableWrite:
Output[columnNames = [_col0, _col1]]
│   Layout: [expr:varchar, field:bigint]
│   _col0 := expr
│   _col1 := field
└─ CrossJoin Unnest[replicate = [expr:varchar], unnest = [expr_0:array(bigint)]]
   │   Layout: [expr:varchar, field:bigint]
   └─ LocalExchange[partitioning = ROUND_ROBIN]
      │   Layout: [expr:varchar, expr_0:array(bigint)]
      └─ Values[]
             Layout: [expr:varchar, expr_0:array(bigint)]
             (varchar 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa....

2025-06-12T11:22:39.741Z        DEBUG   dispatcher-query-3      io.trino.execution.StageStateMachine    environment=production,host=worker1 Stage 20250612_112234_00000_it8bd.0 is PLANNED
2025-06-12T11:22:39.749Z        DEBUG   dispatcher-query-3      io.trino.execution.QueryStateMachine    environment=production,host=worker1  Query 20250612_112234_00000_it8bd is STARTING
2025-06-12T11:22:39.767Z        DEBUG   stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution    environment=production,host=worker1 Pipelined stage execution 20250612_112234_00000_it8bd.0 is PLANNED
2025-06-12T11:22:39.797Z        DEBUG   Query-20250612_112234_00000_it8bd-498   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1  selectedForExecution: [PipelinedStageStateMachine{stageId=20250612_112234_00000_it8bd.0, state=PLANNED}]
2025-06-12T11:22:39.800Z        DEBUG   Query-20250612_112234_00000_it8bd-498   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1  fragmentDependency: isDirected: true, allowsSelfLoops: false, nodes: [0], edges: [], fragmentTopology: isDirected: true, allowsSelfLoops: false, nodes: [0], edges: [], sortedFragments: [0], stagesByFragmentId: {0=PipelinedStageStateMachine{stageId=20250612_112234_00000_it8bd.0, state=PLANNED}}
2025-06-12T11:22:39.804Z        DEBUG   Query-20250612_112234_00000_it8bd-517   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1  scheduledStages: []
2025-06-12T11:22:39.807Z        DEBUG   Query-20250612_112234_00000_it8bd-517   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1 blockedFragments: []
2025-06-12T11:22:39.808Z        DEBUG   Query-20250612_112234_00000_it8bd-517   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1  selectedForExecution: []
2025-06-12T11:22:39.809Z        DEBUG   stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution    environment=production,host=worker1  Pipelined stage execution 20250612_112234_00000_it8bd.0 is SCHEDULING
2025-06-12T11:22:39.811Z        DEBUG   dispatcher-query-3      io.trino.execution.StageStateMachine    environment=production,host=worker1 Stage 20250612_112234_00000_it8bd.0 is SCHEDULING
2025-06-12T11:22:39.844Z        DEBUG   dispatcher-query-3      io.trino.execution.StageStateMachine    environment=production,host=worker1 Stage 20250612_112234_00000_it8bd.0 is RUNNING
2025-06-12T11:22:39.854Z        DEBUG   Query-20250612_112234_00000_it8bd-517   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1 scheduledStages: [PipelinedStageStateMachine{stageId=20250612_112234_00000_it8bd.0, state=SCHEDULED}]
2025-06-12T11:22:39.854Z        DEBUG   stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution    environment=production,host=worker1 Pipelined stage execution 20250612_112234_00000_it8bd.0 is SCHEDULED
2025-06-12T11:22:39.854Z        DEBUG   dispatcher-query-4      io.trino.execution.QueryStateMachine    environment=production,host=worker1 Query 20250612_112234_00000_it8bd is RUNNING
2025-06-12T11:22:39.855Z        DEBUG   Query-20250612_112234_00000_it8bd-517   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1 blockedFragments: []
2025-06-12T11:22:39.855Z        DEBUG   Query-20250612_112234_00000_it8bd-517   io.trino.execution.scheduler.policy.PhasedExecutionSchedule     environment=production,host=worker1 selectedForExecution: []
2025-06-12T11:22:39.914Z        DEBUG   task-notification-0     io.trino.execution.TaskStateMachine     environment=production,host=worker1  Task 20250612_112234_00000_it8bd.0.0.0 is RUNNING
2025-06-12T11:22:40.437Z        DEBUG   stage-scheduler io.trino.execution.scheduler.PipelinedStageExecution    environment=production,host=worker1  Pipelined stage execution 20250612_112234_00000_it8bd.0 is RUNNING

Env: Trino version - 476 JDK - Temurin 24.0.1 JVM heap size - 200 GB

Note: If I adjust the original query slightly (sequece(1,5000) to sequence(1,2000)), I can avoid the node from crashing. But the CLI I used got Out of memory exception.

SELECT *
FROM (
	SELECT concat_ws('', repeat(concat_ws('', repeat('a', 1000)), 500))
) CROSS JOIN UNNEST(sequence(1, 2000));

sajjoseph avatar Jun 12 '25 11:06 sajjoseph