spark
spark copied to clipboard
[SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files
What changes were proposed in this pull request?
From this pr:https://github.com/apache/spark/pull/22112, we learn that currently we can't rollback and rerun a result stage, and just fail.
And this new pr is designed to solve some scenarios of this problem. When the analysis result from the result stage of a job will be output to a storage system, it can be written to a file system or database system.
- If the result was written to a file system, it was stored in a temporary directory until the result stage run successfully. If the result stage whose map stage is indeterminate failed but had committed output for some partitions, we can delete these temporary files and roll back the result stage.
- If the result was written to a database system, it will be written directly to the database and therefore if the result stage whose map stage is indeterminate failed but some result tasks were successful, the result has been written successfully can not be rolled back
- Therefore, the main purpose of this new pr is to support Result Stage rollback in the scenarios of writing to any file system.
- I added a new identifier
isResultStageRetryAllowedin RDD class to indicate whether its corresponding Result stage supports retries. It is a Boolean variable and the default value is false,indicating that result stage rollback is not supported and corresponds to the scenario of writing to the database. And in the case of writing to the file system, the result stage supports retries, andisResultStageRetryAllowedwill be changed to true.
Does this PR introduce any user-facing change?
No
How was this patch tested?
new tests and manually test
write to hive

write to iceberg

write to hdfs

write to mysql

gently ping @cloud-fan Can you help to review this PR
Can one of the admins verify this patch?
gently ping @cloud-fan Can you help to review this PR
@cloud-fan Hi, could you help to review this pr? Thanks
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!