spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-49252][CORE] Make`TaskSetExcludeList` and `HeathTracker` independent

Open tianhanhu opened this issue 1 year ago • 4 comments

What changes were proposed in this pull request?

Make the change such that TaskSetExcludeList and HeathTracker can be enabled independently.

When application level HealthTracker is created, but taskset level exclusion is not enabled, TaskSetExcludeList would be created in dry run mode, where it still records and reports task failure data to HealthTracker but does not participate in scheduler decision making.

Why are the changes needed?

Currently, when spark.excludeOnFailure.enabled is set to true, both task set level exclusion (TaskSetExcludeList) and application level (HealthTracker) would both be enabled. In some cases, we only want to enable exclusion on a single dimension.

Does this PR introduce any user-facing change?

Yes, introduced two new user facing configs spark.excludeOnFailure.application.enabled and spark.excludeOnFailure.taskAndStage.enabled that allows setting exclusion for taskset/application individually.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

tianhanhu avatar Aug 16 '24 21:08 tianhanhu

@cloud-fan @jiangxb1987 can I get a review on this PR? Thx!

tianhanhu avatar Aug 16 '24 21:08 tianhanhu

cc @Ngone51

cloud-fan avatar Aug 22 '24 14:08 cloud-fan

It should be good to mention the dryrun mode introduced in this PR.

jiangxb1987 avatar Aug 26 '24 20:08 jiangxb1987

It should be good to mention the dryrun mode introduced in this PR.

Done updating description.

tianhanhu avatar Aug 27 '24 00:08 tianhanhu

cc @jerryshao @mridulm @Ngone51

jiangxb1987 avatar Aug 28 '24 17:08 jiangxb1987

Thanks, merged to master!

Ngone51 avatar Aug 30 '24 03:08 Ngone51