flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-33391][runtime] Support tasks balancing at TM level for Adaptive Scheduler.

Open RocMarshal opened this issue 1 year ago • 1 comments

What is the purpose of the change

[FLINK-33391][runtime] Support tasks balancing at TM level for Adaptive Scheduler.

Brief change log

  • Introduce the abstraction and the interface about loading
  • Introduce the TASKS for TaskManagerLoadBalanceMode enum.
  • Support tasks balancing at TM level for Adaptive Scheduler.

Verifying this change

This change is already covered by existing tests, such as (please describe tests).

  • TaskBalancedAbstractRequestSlotMatcherTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

RocMarshal avatar Oct 15 '24 09:10 RocMarshal

CI report:

  • 655f44418bd4791252110e4ce2164b935e78aba0 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Oct 15 '24 10:10 flinkbot

hi, @mxm Could you help take a look ? Any input is appreciated

RocMarshal avatar Jul 01 '25 06:07 RocMarshal

Could you elaborate a bit why this change is important?

mxm avatar Jul 03 '25 08:07 mxm

Could you elaborate a bit why this change is important?

Thank you @mxm very much for your attention. Please let me explain the origin and motivation of this PR. This feature comes from FLIP-370[1], which primarily addresses the performance issues caused by resource bottlenecks due to uneven distribution of tasks across TaskManagers. Hope this helps:)

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling

RocMarshal avatar Jul 03 '25 09:07 RocMarshal

Hi @RocMarshal! Thanks for the PR! In the PR description you mention TaskBalancedAbstractRequestSlotMatcherTest, but I cannot find it in the changes.

Hi, @mxm Thanks for your reminder. The PR was updated multiple times, but ignored the PR description updation. I updated it just now.

RocMarshal avatar Aug 27 '25 07:08 RocMarshal