flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-33386][runtime] Support tasks balancing at slot level for Default Scheduler

Open RocMarshal opened this issue 2 years ago • 10 comments

What is the purpose of the change

  • Support tasks balancing at slot level for Default Scheduler

Brief change log

  • Introduce BalancedPreferredSlotSharingStrategy to support tasks balancing at slot level.
  • Expose the configuration item to switch tasks balancing at slot level for Default Scheduler.

Verifying this change

This change added tests and can be verified as follows:

  • org.apache.flink.runtime.scheduler.BalancedPreferredSlotSharingStrategyTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

RocMarshal avatar Oct 31 '23 15:10 RocMarshal

CI report:

  • 50f3dd8ed507a3ad36df53f013f900332b0d0b96 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Oct 31 '23 15:10 flinkbot

Thank you @KarmaGYZ @1996fanrui very much for your comments. and I updated the PR based on your comments. Have a great weekend~ :)

RocMarshal avatar Nov 11 '23 14:11 RocMarshal

Hi @KarmaGYZ , thanks for your hard review!

I think this PR contains two components. First would be a supplement of FLINK-33448. Second is part of the TASKS strategy. I think we may split it into two seperate commit.

Split it makes sense, it's clearer.

It would be better to include FLINK-33388 and introduce TASKS strategy.

Would you mind if we keep them into multiple PRs? I'm afraid one PR has a lot of commits and changes is hard to review. Of course, only one PR is acceptable for me.

1996fanrui avatar Nov 13 '23 07:11 1996fanrui

Hi, @KarmaGYZ @1996fanrui Thank you very much for your patient review comments. I updated it based on your comments. PTAL in your free time,Have a nice weekend~

RocMarshal avatar Nov 25 '23 14:11 RocMarshal

The waiting mechanism is ready for the review. Would you @KarmaGYZ @1996fanrui help take a look if you were in free time? Thank you very much~ And the verification part about the test would be refactored after external junit5 migrated.

RocMarshal avatar Dec 04 '23 02:12 RocMarshal

Thank you @1996fanrui @KarmaGYZ very much for the review

I have re evaluated the implementation location of the waiting mechanisms based on @KarmaGYZ offline suggestions.

If two waiting mechanisms are placed in DeclarativeSlotPool, there would be preciser & conciser information to maintain.

  • The maintenance of reserve/free slot/resource profiles should be simpler and more intuitive.

If we can reach an agreement on It, I would like to confirm again whether we still use mainThreadExecutor to complete the timeout waiting mechanism for checking? If so, this may require changing the create method of DeclarativeSlotPoolFactory

Please let me know your opinions.

RocMarshal avatar Dec 05 '23 14:12 RocMarshal

@RocMarshal Just be curious about the progress, does this PR still wait for some comments to be addressed before it could be merged?

Myasuka avatar Jan 15 '24 16:01 Myasuka

@RocMarshal Just be curious about the progress, does this PR still wait for some comments to be addressed before it could be merged?

This PR is in progress now. We plan to merge it after the complete Task Balancing feature is implemented.

KarmaGYZ avatar Jan 16 '24 02:01 KarmaGYZ

👋 Hi, are there any updates or progress on this work as part of FLIP-370?

klam-shop avatar Jun 05 '24 19:06 klam-shop

👋 Hi, are there any updates or progress on this work as part of FLIP-370?

thx for your attention. It's still in working. will update in the next few days.

RocMarshal avatar Jun 06 '24 03:06 RocMarshal