[FLINK-35552][runtime] Moves CheckpointStatsTracker out of DefaultExecutionGraphFactory into Scheduler
PR Chain
- FLINK-35550: https://github.com/apache/flink/pull/24909
- FLINK-35551: https://github.com/apache/flink/pull/24910
- ⭐ FLINK-35552: https://github.com/apache/flink/pull/24911
- FLINK-35553: https://github.com/apache/flink/pull/24912
What is the purpose of the change
The AdaptiveScheduler needs to have access to the CheckpointsStatsTracker to monitor checkpoint-related events.
Brief change log
- Refactors
CheckpointStatsTrackerconstructor to not rely on the total subtask count anymore when initializing the tracker - Moves
CheckpointStatsTrackerownership fromDefaultExecutionGraphFactoryto the scheduler implementations - Makes
CheckpointStatsTrackeran "implementation detail" of the execution graph that's not exposed through API.
Verifying this change
- Existing tests are covering the change.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving): no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
CI report:
- f52650d9a1586e971d5736a0ac69dc4a06f03bc4 UNKNOWN
- 3764e627199e89cbb72aaaf9bc47e8fee7097704 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@flinkbot run azure
Force-pushed the rebase onto the most-recent version of base PR #24910
$ git rebase --onto=FLINK-35551 6af9560d62f963fc8d85d26807627aa452932fcf
@flinkbot run azure
I rebased the branch to master after the base PR #24910 was merged to master.
Observed CI failures documented:
- FLINK-25453 for the
SqlGatewayE2ECase.testMaterializedTableInFullMode - FLINK-35722 for the
CoordinatorEventsToStreamOperatorRecipientExactlyOnceITCase.testCheckpoint
@flinkbot run azure
CI with AdaptiveScheduler enabled was successful. I'm gonna go ahead and prepare this PR to be merged (i.e. remove the DO-NOT-MERGE commit).