spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-52395][CORE] Fast fail when shuffle fetch failure happens

Open ivoson opened this issue 6 months ago • 2 comments

What changes were proposed in this pull request?

Currently for shuffle reading, ShuffleBlockFetcherIterator will fetch local and host local blocks in task thread and send remote block fetch request in parallel, and the fetch result including FailureFetchResult will be added to a LinkedBlockingQueue to be consumed in FIFO mode.

This means we have to process all the successfully fetched blocks and then fail the task when found a FailureFetchResult, it could take long time in some cases when there are many successfully blocks or some big ones in the front of the result queue with time consuming operations on these data.

This PR proposes to fail the shuffle read task as early as possible when fetch failure happens by:

  1. Use LinkedBlockingDeque instead of LinkedBlockingQueue for the results.
  2. Add FailureFetchResult to the head of the results queue.

Regarding to performance impact by changing to LinkedBlockingDeque, it might be not a big concern here. As LinkedBlockingDeque could be less efficient in scenarios with extremely high concurrency and frequent lock contention, but shuffle reader is not such case.

Why are the changes needed?

Fast fail when fetch failures happen in shuffle reader to avoid taking time processing the previous fetched shuffle blocks as the task would fail anyway.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UTs

Was this patch authored or co-authored using generative AI tooling?

No

ivoson avatar Jun 09 '25 10:06 ivoson

cc @Ngone51 @jiangxb1987 @cloud-fan Please take a look. Thanks.

ivoson avatar Jun 12 '25 02:06 ivoson

cc @mridulm @attilapiros

Ngone51 avatar Jun 13 '25 05:06 Ngone51

Use LinkedBlockingDeque instead of LinkedBlockingQueue for the results.

Another possible solution to consider would be collecting the FailureFetchResult in a separate collection (which can be LinkedBlockingQueue) and use this new collection first at next() method.

WDYT?

attilapiros avatar Jun 18 '25 04:06 attilapiros

Use LinkedBlockingDeque instead of LinkedBlockingQueue for the results.

Another possible solution to consider would be collecting the FailureFetchResult in a separate collection (which can be LinkedBlockingQueue) and use this new collection first at next() method.

WDYT?

Thanks @attilapiros

I've thought about this, one concern about having 2 collections is that it may lead to some race condition issues such as:

  1. Checking failure results -> Empty
  2. Failure added
  3. Take results from the success queue which might never have new elements added.

Dealing with the race conditions might introduce some complexity here, so this PR proposed to use LinkedBlockingDeque instead.

ivoson avatar Jun 18 '25 05:06 ivoson

Dealing with the race conditions might introduce some complexity here

Right, with two collections this would be lot more complex.

attilapiros avatar Jun 18 '25 17:06 attilapiros

Why we need the feature flag? I fail to see what we lose when this is always on but I can see the flag adds some extra complexity.

Hi @attilapiros, adding feature flag is trying to keep a mitigation for any potential unforeseen issues since shuffle is a critical component and we've made slightly behavior changes in the shuffle fetch iterator.

ivoson avatar Jun 19 '25 05:06 ivoson

We already has too many config parameters and I cannot see any risk here.

@LuciferYang, @jiangxb1987 WDYT?

attilapiros avatar Jun 19 '25 13:06 attilapiros

It indeed seems that there are no apparent risks. I agree to remove the newly added config.

LuciferYang avatar Jun 20 '25 02:06 LuciferYang

Thanks for sharing your opinions. Removing the feature flag. cc @attilapiros @LuciferYang @jiangxb1987

ivoson avatar Jun 20 '25 05:06 ivoson

Hi @attilapiros pls take another look. Thanks.

ivoson avatar Jun 24 '25 03:06 ivoson

Merged into master for Apache Spark 4.1.0. Thanks @ivoson @attilapiros @Ngone51 @jiangxb1987 and @mridulm

LuciferYang avatar Jun 24 '25 04:06 LuciferYang

Thanks for your review @LuciferYang @jiangxb1987 @Ngone51 @attilapiros @mridulm

ivoson avatar Jun 24 '25 04:06 ivoson