flink
flink copied to clipboard
[FLINK-34518][runtime] Fixes AdaptiveScheduler#suspend bug when the job is suspended during Restarting phase
What is the purpose of the change
See comment in FLINK-34518 for more details.
Brief change log
- Overwrites the
ExecutionGraph
's state when suspending the job in theRestarting
phase: The actually state might beCANCELLED
which can result in a HA data cleanup because it's a globally-terminal state which we don't want when restarting the job. The cancellation of theExecutionGraph
is more like an implementation detail of theRestarting
state and shouldn't be exposed.
Verifying this change
- Added unit test to cover this scenario
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving)
: no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
CI report:
- 3176a981a1692542fef60ebad55d1b80e60c8d60 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:-
@flinkbot run azure
re-run the last Azure build