flink-kubernetes-operator icon indicating copy to clipboard operation
flink-kubernetes-operator copied to clipboard

[FLINK-35265] Implement FlinkStateSnapshot custom resource

Open mateczagany opened this issue 9 months ago • 0 comments

What is the purpose of the change

Implement FlinkStateSnapshot as according to FLIP-446. This PR does not include the e2e-tests and documentation.

Brief change log

  • Added FlinkStateSnapshot and all its dependent classes to flink-kubernetes-operator-api
  • Deprecated several fields in FlinkDeployment/FlinkSessionJob as accepted in the FLIP
  • Refactored several methods in FlinkService to extract the logic of saving snapshot path to other classes
  • Added test in FlinkConfigManager class to check if the CR FlinkStateSnapshot can be created on the current Kubernetes server during runtime. This is intended to be temporary to ensure a smooth upgrade process.
  • Refactored metric- and status-related classes to be able to handle the new CR

Verifying this change

  • Added unit-tests for new features
  • Manual testing

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: yes
  • Core observer or reconciler logic that is regularly executed: yes

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented, there will be a separate PR for that before this one gets merged

Other implementation details

  • In case of a periodic snapshots, the Operator will create new FlinkStateSnapshot CRs, and the snapshot will be taken when that resource is reconciled. Its labels are not final yet.
  • In case of upgrade snapshots, the Operator will create a new FlinkStateSnapshot CR, marking it with alreadyExists.
  • Manual snapshots won't work with savepointTriggerNonce with the new CR, the user is expected to create FlinkStateSnapshot CRs themselves.
  • Two new configurations were also added that were not specified in the FLIP:
    • periodic.savepoint.dispose-on-delete
    • job.upgrade.savepoint.dispose-on-delete
  • Other metrics and configurable max history age/count will be implemented in FLINK-35492 and FLINK-35493 respectively.

mateczagany avatar Apr 29 '24 11:04 mateczagany