spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files

Open caican00 opened this issue 3 years ago • 3 comments

What changes were proposed in this pull request?

From this pr:https://github.com/apache/spark/pull/22112, we learn that currently we can't rollback and rerun a result stage, and just fail.

And this new pr is designed to solve some scenarios of this problem. When the analysis result from the result stage of a job will be output to a storage system, it can be written to a file system or database system.

  1. If the result was written to a file system, it was stored in a temporary directory until the result stage run successfully. If the result stage whose map stage is indeterminate failed but had committed output for some partitions, we can delete these temporary files and roll back the result stage.
  2. If the result was written to a database system, it will be written directly to the database and therefore if the result stage whose map stage is indeterminate failed but some result tasks were successful, the result has been written successfully can not be rolled back
  3. Therefore, the main purpose of this new pr is to support Result Stage rollback in the scenarios of writing to any file system.
  4. I added a new identifier isResultStageRetryAllowed in RDD class to indicate whether its corresponding Result stage supports retries. It is a Boolean variable and the default value is false,indicating that result stage rollback is not supported and corresponds to the scenario of writing to the database. And in the case of writing to the file system, the result stage supports retries, and isResultStageRetryAllowed will be changed to true.

Does this PR introduce any user-facing change?

No

How was this patch tested?

new tests and manually test

write to hive image

write to iceberg image

write to hdfs image

write to mysql image

caican00 avatar Aug 01 '22 12:08 caican00

gently ping @cloud-fan Can you help to review this PR

caican00 avatar Aug 01 '22 13:08 caican00

Can one of the admins verify this patch?

AmplabJenkins avatar Aug 01 '22 20:08 AmplabJenkins

gently ping @cloud-fan Can you help to review this PR

@cloud-fan Hi, could you help to review this pr? Thanks

caican00 avatar Aug 09 '22 10:08 caican00

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Nov 18 '22 00:11 github-actions[bot]