flink
flink copied to clipboard
[FLINK-34518][runtime] Fixes AdaptiveScheduler#suspend bug when the job is suspended during Restarting phase
What is the purpose of the change
See comment in FLINK-34518 for more details.
Brief change log
- Overwrites the
ExecutionGraph's state when suspending the job in theRestartingphase: The actually state might beCANCELLEDwhich can result in a HA data cleanup because it's a globally-terminal state which we don't want when restarting the job. The cancellation of theExecutionGraphis more like an implementation detail of theRestartingstate and shouldn't be exposed.
Verifying this change
- Added unit test to cover this scenario
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving): no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
CI report:
- 3176a981a1692542fef60ebad55d1b80e60c8d60 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build