spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-46895][CORE][3.5] Replace Timer with single thread scheduled executor

Open jshmchenxi opened this issue 1 year ago • 5 comments

Depends on the first commit for fixing SPARK-49479 in branch 3.5: #47957 -> This is the second commit for fixing SPARK-49479 in branch 3.5. Third commit for fixing SPARK-49479 in branch 3.5: #47956

What changes were proposed in this pull request?

This PR propose to replace Timer with single thread scheduled executor.

Why are the changes needed?

The javadoc recommends ScheduledThreadPoolExecutor instead of Timer. 屏幕快照 2024-01-12 下午12 47 57

This change based on the following two points. System time sensitivity

Timer scheduling is based on the absolute time of the operating system and is sensitive to the operating system's time. Once the operating system's time changes, Timer scheduling is no longer precise. The scheduled Thread Pool Executor scheduling is based on relative time and is not affected by changes in operating system time.

Are anomalies captured

Timer does not capture exceptions thrown by Timer Tasks, and in addition, Timer is single threaded. Once a scheduling task encounters an exception, the entire thread will terminate and other tasks that need to be scheduled will no longer be executed. The scheduled Thread Pool Executor implements scheduling functions based on a thread pool. After a task throws an exception, other tasks can still execute normally.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

GA tests.

Was this patch authored or co-authored using generative AI tooling?

'No'.

jshmchenxi avatar Aug 31 '24 05:08 jshmchenxi

Kindly ping @beliefer @LuciferYang @dongjoon-hyun.

jshmchenxi avatar Aug 31 '24 10:08 jshmchenxi

This is only a minor improvement for Spark 4.0, not a bug fix, and it should not be backported to 3.x.

LuciferYang avatar Aug 31 '24 12:08 LuciferYang

This is only a minor improvement for Spark 4.0, not a bug fix, and it should not be backported to 3.x.

@LuciferYang Thanks for reply! It actually includes a fix for https://issues.apache.org/jira/browse/SPARK-49479 as a side effect, by adding shutdown of a timer in BarrierCoordinator.

jshmchenxi avatar Aug 31 '24 12:08 jshmchenxi

@jshmchenxi, Can we submit separate pull requests along with their primitive forms?

yaooqinn avatar Sep 02 '24 03:09 yaooqinn

@jshmchenxi, Can we submit separate pull requests along with their primitive forms?

Yes, I created another PR for the first commit: #47957. The second commit is in this PR.

And the fix for SPARK-49479 is #47956. It depends on the other 2 commits to fix in branch 3.5.

jshmchenxi avatar Sep 02 '24 14:09 jshmchenxi

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Jan 27 '25 00:01 github-actions[bot]