druid icon indicating copy to clipboard operation
druid copied to clipboard

fix capacity issue for category aware

Open AlbericByte opened this issue 1 year ago • 13 comments

Fixes #15847.

Description

Fix the bug of capacity api which is not category aware The proposal solution is following:

  1. If current WorkerSelectStrategy is category based strategy(FillCapacityWithCategorySpecWorkerSelectStrategy and EqualDistributionWithCategorySpecWorkerSelectStrategy), we will calculate the capacity group by category in getWorkerCategoryCapacity method.
  2. if current WorkerSelectStrategy is not category based strategy, the getWorkerCategoryCapacity will return null.
  3. In OverlordResource#getTotalWorkerCapacity method, get the capacity group by category from WorkerSelectStrategy#getWorkerCategoryCapacity
  4. In CoordinatorDutyUtils#getTotalWorkerCapacity if overlordClient.getTotalWorkerCapacity() return category based capacity, return Math.min(totalCapacity, (int) (totalWorkerCapacity * taskSlotRatio)); here totalCapacity is category capacity. Summary: according to the WorkerSelectStrategy(category based or not), set the category capacity in TotalWorkerCapacityResponse, in client side, compare the category capacity and totalWorkerCapacity * taskSlotRatio, take the smaller one

Fixed the bug 15847

Release note

Fix the bug of getCompactionTaskCapacity which is not category aware
Update the getTotalWorkerCapacity of OverlordResource

if WorkerSelectStrategy is category based strategy(FillCapacityWithCategorySpecWorkerSelectStrategy and EqualDistributionWithCategorySpecWorkerSelectStrategy), TotalWorkerCapacityResponse will contains the capacity group by category.


Key changed/added classes in this PR
  • CategoryCapacityInfo
  • OverlordResource
  • TotalWorkerCapacityResponse
  • EqualDistributionWithCategorySpecWorkerSelectStrategy
  • FillCapacityWithCategorySpecWorkerSelectStrategy
  • WorkerSelectStrategy
  • WorkerSelectUtils
  • OverlordResourceTest
  • EqualDistributionWithCategorySpecWorkerSelectStrategyTest
  • FillCapacityWithCategorySpecWorkerSelectStrategyTest
  • IndexingCategoryCapacityInfo
  • IndexingTotalWorkerCapacityInfo
  • IndexingWorker
  • CompactSegments
  • CoordinatorDutyUtils
  • KillUnusedSegments
  • OverlordClientImplTest
  • CompactSegmentsTest
  • KillUnusedSegmentsTest

This PR has:

  • [x] been self-reviewed.
  • [ ] using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • [ ] added documentation for new or modified features or behaviors.
  • [ ] a release note entry in the PR description.
  • [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • [ ] added or updated version, license, or notice information in licenses.yaml
  • [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • [x] added integration tests.
  • [x] been tested in a test Druid cluster.

AlbericByte avatar May 05 '24 19:05 AlbericByte

@AlbericByte , thanks for the changes. Please add an explanation of the approach you have taken in the PR description and also include a release note for the new API changes.

kfaraz avatar May 06 '24 03:05 kfaraz

@AlbericByte , thanks for the changes. Please add an explanation of the approach you have taken in the PR description and also include a release note for the new API changes.

@kfaraz i added the explanation and updated the release note section. for release note section, it is my first time, i am not sure it is correct format, if not, could you help to share the right format, i will update asap.

And I will fix the test

AlbericByte avatar May 06 '24 17:05 AlbericByte

@kfaraz @abhishekagarwal87 and @gianm for the failing test case, i have try sever times in my local environment using the latest code, it can successfully. do i need to increase the java memory for test, welcome for any suggestion?

org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTaskResourceTest.testAPIs
Error:    Run 1: ParallelIndexSupervisorTaskResourceTest.testAPIs:214 » TestTimedOut test timed out after 20000 milliseconds
Error:    Run 2: ParallelIndexSupervisorTaskResourceTest.testAPIs:214 » TestTimedOut test timed out after 20000 milliseconds

AlbericByte avatar May 10 '24 19:05 AlbericByte

hi @kfaraz @abhishekagarwal87 and @gianm Could you help to review this pr when you are not busy, i hope i can contribute more with your help.

AlbericByte avatar May 14 '24 18:05 AlbericByte

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Sep 09 '24 00:09 github-actions[bot]

unstale

jakubmatyszewski avatar Sep 09 '24 08:09 jakubmatyszewski

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Nov 09 '24 00:11 github-actions[bot]

unstale

jakubmatyszewski avatar Nov 11 '24 20:11 jakubmatyszewski

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jan 11 '25 00:01 github-actions[bot]

unstale

jakubmatyszewski avatar Jan 20 '25 09:01 jakubmatyszewski

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Mar 22 '25 00:03 github-actions[bot]

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar May 27 '25 00:05 github-actions[bot]

Sorry, this one fell through the cracks. Are you still interested in working on it @AlbericByte? If so, please let me know, I will do a review.

gianm avatar May 31 '25 00:05 gianm

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Jul 30 '25 00:07 github-actions[bot]

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions[bot] avatar Aug 28 '25 00:08 github-actions[bot]

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Oct 29 '25 00:10 github-actions[bot]

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions[bot] avatar Nov 27 '25 00:11 github-actions[bot]