druid icon indicating copy to clipboard operation
druid copied to clipboard

SeekableStreamSupervisor: Don't await task futures in workerExec.

Open gianm opened this issue 1 year ago • 1 comments

Following #17394, workerExec can get deadlocked with itself, because it waits for task futures and is also used as the connectExec for the task client. To fix this, we need to never await task futures in the workerExec.

There are two specific changes: in verifyAndMergeCheckpoints and checkpointTaskGroup, two coalesceAndAwait calls that formerly occurred in workerExec are replaced with Futures.transform (using a callback in workerExec instead).

Because this adjustment removes a source of blocking, it may also improve supervisor responsiveness for high task counts. This is not the primary goal, however. The primary goal is to fix the bug introduced by #17394.

gianm avatar Oct 23 '24 05:10 gianm

Viewing this diff with whitespace hidden better illustrates what the changes are. Most of the lines changed are only indentation.

gianm avatar Oct 23 '24 05:10 gianm