ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-13697. Disk balancer should not run over under utilised datanode volumes

Open Gargi-jais11 opened this issue 5 months ago • 5 comments

What changes were proposed in this pull request?

Suppose density of disk is, 1%, 2%, 8%, 9% ==> this will be treated as disk unbalanced, but here all disks are under utilised. Triggering disk balancer in this scenario is not expected. Min-Disk-Density-For-balancer >= 60% for running balancer can be there to avoid this.

Solution:

  • Add a new configuration hdds.datanode.disk.balancer.min.source.volume.density with default value 60%, means a volume is considered as source volume only if has utilisation greater than or equal to 60%. This prevents the bove scenario of running diskBalancer for all under utilised disks which seems to be imbalanced. Thus saving unnecessary data movement.
  • Update design and feature doc.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13697

How was this patch tested?

Updated exiting UT. Tested on docker :

bash-5.1$ ozone admin datanode diskbalancer start -t 0.0001 -m 5 -a
Start DiskBalancer on datanode(s):
All datanodes
bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode                            Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  BytesMoved(MB)  EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-2.ozone_default      RUNNING         0.0001          10              5            5            0            4611            0               0              
ozone-datanode-3.ozone_default      RUNNING         0.0001          10              5            5            0            4614            0               0              
ozone-datanode-1.ozone_default      RUNNING         0.0001          10              5            5            0            4628            0               0              
ozone-datanode-4.ozone_default      RUNNING         0.0001          10              5            5            0            4613            0               0              
ozone-datanode-5.ozone_default      RUNNING         0.0001          10              5            5            0            4620            0               0              

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

bash-5.1$ ozone admin datanode diskbalancer update -m 15 -a
Update DiskBalancer Configuration on datanode(s):
All datanodes

bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode                            Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  BytesMoved(MB)  EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-2.ozone_default      STOPPED         0.0001          10              5            9            0            8305            0               0              
ozone-datanode-3.ozone_default      STOPPED         0.0001          10              5            9            0            8305            0               0              
ozone-datanode-1.ozone_default      STOPPED         0.0001          10              5            10           0            9260            0               0              
ozone-datanode-4.ozone_default      STOPPED         0.0001          10              5            10           0            9226            0               0              
ozone-datanode-5.ozone_default      STOPPED         0.0001          10              5            10           0            9232            0               0              

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

Gargi-jais11 avatar Oct 07 '25 07:10 Gargi-jais11

@ChenSammi and @sumitagrawl. Could you please review this?

Gargi-jais11 avatar Oct 07 '25 10:10 Gargi-jais11

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions[bot] avatar Nov 11 '25 00:11 github-actions[bot]

@Gargi-jais11 please resolve conflicts

adoroszlai avatar Nov 13 '25 11:11 adoroszlai

@Gargi-jais11 please resolve conflicts

@adoroszlai for now we are on hold for this PR. Once HDDS-13878 issue is merged we can get back to this and get it merged.

Gargi-jais11 avatar Nov 13 '25 11:11 Gargi-jais11

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions[bot] avatar Dec 16 '25 00:12 github-actions[bot]

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

github-actions[bot] avatar Dec 24 '25 00:12 github-actions[bot]