flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-28515][checkpoint]Try to clean up localSnapshot files after checkpoint aborted

Open ljz2051 opened this issue 2 years ago • 2 comments

What is the purpose of the change

This pull request fix the problem that files in local recovery directory hasn't be clean up properly after checkpoint abort.

Brief change log

  • Judge the checkpoint whether to register into TaskLocalStateStoreImpl when TaskLocalStateStoreImpl abortCheckpoint
  • try to delete the localRecovery directory even if the checkpoint isn't registered into TaskLocalStateStoreImpl

Verifying this change

This change added tests and can be verified as follows: org.apache.flink.runtime.state.TaskLocalStateStoreImplTest#abortUnregisteredCheckpoint()

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? no

ljz2051 avatar Jul 12 '22 11:07 ljz2051

CI report:

  • 3856f807fb83149dca2d4261ef2443a9e82a1ac1 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Jul 12 '22 11:07 flinkbot

@ljz2051 thanks for your contribution, could you please rebase master to resolve the conflicts, thanks.

klion26 avatar Sep 21 '22 06:09 klion26