Roman

Results 21 comments of Roman

I'm [concerned](https://github.com/apache/flink/pull/20421#discussion_r941260334) about introducing the unnecessary term (from my perspective), but I wouldn't mind merging this PR as it is.

I still have the same [concern](https://github.com/apache/flink/pull/20421#discussion_r940348087). Probably someone else would disagree with me and approve the PR, I'd be absolutely fine with that.

Thanks for the PR @JesseAtSZ , I'm trying to understand why checkpoints are failing with `FINALIZE_CHECKPOINT_FAILURE` (which is ignored by `CheckpointFailureManager`) and not something like `IOException`. From the code, it...

Thanks @JesseAtSZ, Could you please confirm that the job is stateless? > the initialization on Coordinator will before the performCheckpoint on TM. [Not always](https://github.com/apache/flink/blob/16109a31468949f09c2a7bba9003761726e3d61c/flink-runtime/src/main/java/org/apache/flink/runtime/state/filesystem/FsCheckpointStorageAccess.java#L145) - if a default location was...

Got it, thanks. My concern is that probably we should fix the error handling in `CheckpointFailureManager`, instead of `mkdirs` call: ``` case FINALIZE_CHECKPOINT_FAILURE: // ignore break; ``` The [documentation](execution.checkpointing.tolerable-failed-checkpoints) for...

Regarding `TRIGGER_CHECKPOINT_FAILURE`, I **had** the following concerns - but after checking the code they turned out to be wrong: If counted as failure: - "not all tasks are running" could...

I've checked it locally and found that `WindowDistinctAggregateITCase` fails with `Exceeded checkpoint tolerable failure threshold`, which is preceeded by ``` Caused by: org.apache.flink.util.FlinkRuntimeException: The vertex GlobalWindowAggregate[28] -> Calc[29] -> LocalWindowAggregate[30]...

@gaoyunhaii did you have a chance to look at the test? @JesseAtSZ right now it's only a single test failure, which is likely related to a particular case (finished sources)....