HDDS-13697. Disk balancer should not run over under utilised datanode volumes
What changes were proposed in this pull request?
Suppose density of disk is, 1%, 2%, 8%, 9% ==> this will be treated as disk unbalanced, but here all disks are under utilised. Triggering disk balancer in this scenario is not expected. Min-Disk-Density-For-balancer >= 60% for running balancer can be there to avoid this.
Solution:
- Add a new configuration
hdds.datanode.disk.balancer.min.source.volume.densitywith default value 60%, means a volume is considered as source volume only if has utilisation greater than or equal to 60%. This prevents the bove scenario of running diskBalancer for all under utilised disks which seems to be imbalanced. Thus saving unnecessary data movement. - Update design and feature doc.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13697
How was this patch tested?
Updated exiting UT. Tested on docker :
bash-5.1$ ozone admin datanode diskbalancer start -t 0.0001 -m 5 -a
Start DiskBalancer on datanode(s):
All datanodes
bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode Status Threshold(%) BandwidthInMB Threads SuccessMove FailureMove BytesMoved(MB) EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-2.ozone_default RUNNING 0.0001 10 5 5 0 4611 0 0
ozone-datanode-3.ozone_default RUNNING 0.0001 10 5 5 0 4614 0 0
ozone-datanode-1.ozone_default RUNNING 0.0001 10 5 5 0 4628 0 0
ozone-datanode-4.ozone_default RUNNING 0.0001 10 5 5 0 4613 0 0
ozone-datanode-5.ozone_default RUNNING 0.0001 10 5 5 0 4620 0 0
Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.
bash-5.1$ ozone admin datanode diskbalancer update -m 15 -a
Update DiskBalancer Configuration on datanode(s):
All datanodes
bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode Status Threshold(%) BandwidthInMB Threads SuccessMove FailureMove BytesMoved(MB) EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-2.ozone_default STOPPED 0.0001 10 5 9 0 8305 0 0
ozone-datanode-3.ozone_default STOPPED 0.0001 10 5 9 0 8305 0 0
ozone-datanode-1.ozone_default STOPPED 0.0001 10 5 10 0 9260 0 0
ozone-datanode-4.ozone_default STOPPED 0.0001 10 5 10 0 9226 0 0
ozone-datanode-5.ozone_default STOPPED 0.0001 10 5 10 0 9232 0 0
Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.
@ChenSammi and @sumitagrawl. Could you please review this?
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.
@Gargi-jais11 please resolve conflicts
@Gargi-jais11 please resolve conflicts
@adoroszlai for now we are on hold for this PR. Once HDDS-13878 issue is merged we can get back to this and get it merged.
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.