spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39957][CORE] Delay onDisconnected to enable Driver receives ExecutorExitCode

Open kevin85421 opened this issue 2 years ago • 2 comments

What changes were proposed in this pull request?

When onDisconnected is triggered,

(1) Delay RemoveExecutor for 5 seconds to enable driver receives ExecutorExitCode from slow path (2) Prevent task scheduler from assigning tasks on the lost executor. (By adding the executor to executorsPendingLossReason)

Why are the changes needed?

There are two methods to detect executor loss.

(1) (fast path) onDisconnected Executor -> Driver: When Executor closes its JVM, the socket (Netty's channel) will be closed. The function onDisconnected will be triggered when it knows the channel is closed.

(2) (slow path) ExecutorRunner -> Worker -> Master -> Driver (See #37385 for details) When executor exits with ExecutorExitCode, the exit code will be passed from ExecutorRunner to Driver.

Because fast path determines the executor loss without the information of ExecutorExitCode, these two methods may categorize same cases into different conclusions. For example, when Executor exits with ExecutorExitCode HEARTBEAT_FAILURE, onDisconnected will consider the executor loss as a task failure, but slow path will consider it as a network failure. Obviously, HEARTBEAT_FAILURE is a network failure.

[Notice] For more details about ExecutorExitCode, check #37385 for more details.

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt "core/testOnly *SparkContextSuite -- -z ExitCode"

kevin85421 avatar Aug 03 '22 22:08 kevin85421

cc. @Ngone51 @jiangxb1987

kevin85421 avatar Aug 03 '22 22:08 kevin85421

Can one of the admins verify this patch?

AmplabJenkins avatar Aug 04 '22 23:08 AmplabJenkins

Gentle ping @Ngone51 @mridulm

kevin85421 avatar Aug 23 '22 05:08 kevin85421

I will let @Ngone51 review/merge - I do not have a lot of context on standalone :-)

mridulm avatar Aug 24 '22 06:08 mridulm

Thanks @kevin85421 @mridulm , merged to Master!

Ngone51 avatar Aug 24 '22 12:08 Ngone51

Thank @mridulm @Ngone51 for the review!

kevin85421 avatar Aug 24 '22 16:08 kevin85421