flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-37155] [Runtime/Coordination] Implementing FLIP-505 for Flink History Server scalability improvements to decouple local and remote storage

Open achang52 opened this issue 4 months ago • 1 comments

What is the purpose of the change

Implementing FLIP-505 for Flink History Server scalability improvements by decoupling local job archive caching with a remote store.

Brief change log

  • Adding new configurations for the Flink History Server historyserver.archive.cached-retained-jobs and historyserver.archive.num-cached-most-recently-viewed-jobs
  • Enabling decoupling the number of job archives stored from the local cache by enabling remote storage
  • Enabling fetching a job archive by jobID

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

This change added tests and can be verified as follows:

  • Added new tests for HistoryServerArchiveFetcherTest.java for ensuring the validation of how cached jobs are evicted and how the local and remote caches interact
  • Added additional test in the HistoryServerTest.java and WebFrontendBootstrapTest to cover local and remote caching behavior
  • Manually verified by deploying the Flink History Server locally with test job archives.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? Documented in JavaDocs as well as in the FLIP-505 - https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch

achang52 avatar Aug 06 '25 21:08 achang52

CI report:

  • 35220bb7caaf6e1ad05ce106e23cac1551ad3654 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Aug 06 '25 21:08 flinkbot