ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7233. Add DiskBalancerService on Datanode

Open symious opened this issue 3 years ago • 4 comments

What changes were proposed in this pull request?

This ticket is to add the DiskBalancerService on Datanode to do the real balancer work.

The points of this PR are as follows:

  1. A new background service is added to datanode as "DiskBalancerService".
  2. This service has 4 parameters: shouldRun, threshold, bandwidthInMB, parallelThread. These parameters will be updated by requests from SCM. When receiving updates on these parameters, the latest parameters will be persisted in a YAML file, and this file will also be when datanode is starting.
  3. As a background service, the service's main procedure will be invoked at an interval and do the following steps:
    1. check "shouldRun", if it's false, it will skip this loop, which means do not do diskbalance job.
    2. check "bandwidth", there is a counter recording bytes being balanced in a window, and it will be used to calculate the next avaiable time to run the balance job based on "bandwidthInMB" limits.
    3. then it will check the available thread count, and try to find the same number of volume pairs to start the balance job.
  4. The balance job will first copy the container to the dest volume's tmp directory, then a new container be loaded and replace the original container.
  5. During the balancing, a set of inProgressContainers is maintained, for deleteBlock and move container commands from SCM, it will check if the container related is being balanced, if yes, the command will be requeued with a maximum requeue limit. Since the bandwidth limit is a "quick balance, long delay" mode, this requeue won't be lasting too long.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7233

How was this patch tested?

unit test.

symious avatar Sep 19 '22 06:09 symious

@ChenSammi @ferhui @lokeshj1703 @siddhantsangwan @sodonnel @neils-dev @JacksonYao287 Could you help to review this PR?

symious avatar Sep 19 '22 14:09 symious

This is a pretty big PR. @symious would it be possible for you to give a short summary of the changes so it's easier to review?

siddhantsangwan avatar Sep 22 '22 12:09 siddhantsangwan

@siddhantsangwan Sure, added a summary in PR's description, please have a look.

symious avatar Sep 23 '22 08:09 symious

@xichen01 could you please review this PR?

ferhui avatar Sep 26 '22 06:09 ferhui

@lokeshj1703 Thanks for your detailed review! do you have other comments?

ferhui avatar Oct 20 '22 07:10 ferhui

@ChenSammi @lokeshj1703 @siddhantsangwan @sodonnel @neils-dev @JacksonYao287 Could you help to review this PR?

hey guys~ let's start to gradually digest this pr together ~ lol

DaveTeng0 avatar Jan 30 '23 07:01 DaveTeng0

Raised a new ticket HDDS-7383 for better review. Please have a look. The other PRs will be on the way.

@symious Can you add to the PR's description or as a comment, the new tickets that were raised from this PR?

xBis7 avatar Jan 30 '23 17:01 xBis7

Can you add to the PR's description or as a comment, the new tickets that were raised from this PR?

@xBis7 Sure updated in the description.

symious avatar Jan 31 '23 01:01 symious

@symious can you please rebase, your branch is 400+ commits behind master.

kerneltime avatar Feb 06 '23 05:02 kerneltime

can you please rebase, your branch is 400+ commits behind master.

Sure, will try to do the rebase lately.

symious avatar Feb 07 '23 07:02 symious

This PR has been split to https://github.com/apache/ozone/pull/4887 and https://github.com/apache/ozone/pull/3874, so I'm closing this one.

symious avatar Jul 03 '23 01:07 symious