ozone
ozone copied to clipboard
HDDS-9889. Configure adaptation for datanode limits in ContainerBalancer
Dynamical adaptation (introduced in HDDS-5526) for datanodes.involved.max.percentage.per.iteration
in container balancer doesn't work well in some cases.
Sometimes the number of under-utilized nodes may not be sufficient to satisfy the limit about the max percent of datanodes participating in the balance iteration (datanodes.involved.max.percentage.per.iteration
). Thus, collections of source and target datanodes are reset and balancing is skipped (see comment).
The issue it can be easily detected when cluster has few nodes (< 10), for example 4 or 5. To fix this case we have to set datanodes.involved.max.percentage.per.iteration
value to 100.
What changes were proposed in this pull request?
Rework some test cases to use clusters with a small number of datanodes.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9889
How was this patch tested?
hdds.scm.container.balancer.TestContainerBalancerTask
is reworked:
- Extracted two classes
-
hdds.scm.container.balancer.MockedSCM
for setting up testablehdds.scm.server.StorageContainerManager
-
hdds.scm.container.balancer.TestableCluster
for creating test cluster with a required number of datanodes
- Add
TestContainerBalancerDatanodeNodeLimit
test with 3 test extracted fromTestContainerBalancerTask
with cluster with different node count.
@JacksonYao287 , @sumitagrawl, please review the changes
@Montura please merge latest master
into your branch, the compile error (not caused by this PR) is fixed in 582a5cec45449caf13f341d9ad59201fcca2902c
@adoroszlai, UPD: done!
UPD: Today I rebased this PR on the master branch (to get the latest changes)
UPD: Today I rebased this PR on the master branch (to get the latest changes)
Thanks @Montura. Only need to update from master
if there is a conflict, or failing checks need code from master
. Also, please use merge
, not rebase
.
@siddhantsangwan @sumitagrawl can you please review?
@siddhantsangwan @sumitagrawl could you please review?
UPD: Today I merged master branch in this PR to resolve conflicts
@siddhantsangwan @sumitagrawl could you please review?
@Montura please keep in mind that end of year is usually a holiday season in many places
@Montura Sorry about the code conflicts. This PR does not allow edit from maintainers, is that intentional? If it did, I'd try to keep it updated after merging PRs touching the same files.
@siddhantsangwan @sumitagrawl please take a look at the patch, to provide high-level feedback until conflicts are resolved
@Montura Sorry about the code conflicts. This PR does not allow edit from maintainers, is that intentional? If it did, I'd try to keep it updated after merging PRs touching the same files.
Tomorrow I'll resolve conflicts, it wasn't intentional to forbid PR editing for maintainers. It's my first PR here, I'll do better next time.
it wasn't intentional to forbid PR editing for maintainers. It's my first PR here, I'll do better next time.
No worries.
@Montura Thanks for working on this. I'm trying to understand the problem. If you wrote a test that fails without your fix, please point me to it.
@Montura Thanks for working on this. I'm trying to understand the problem. If you wrote a test that fails without your fix, please point me to it.
Sure, I'm merging current master now, when I finish, I'll point out to the test
Sometimes the number of under-utilized nodes may not be sufficient to satisfy the limit about the max percent of datanodes participating in the balance iteration (datanodes.involved.max.percentage.per.iteration). Thus, collections of source and target datanodes are reset and balancing is skipped.
I didn't really get this. Can you please elaborate? It'd be helpful to have a small example where you describe this problem.
Sometimes the number of under-utilized nodes may not be sufficient to satisfy the limit about the max percent of datanodes participating in the balance iteration (datanodes.involved.max.percentage.per.iteration). Thus, collections of source and target datanodes are reset and balancing is skipped.
I didn't really get this. Can you please elaborate? It'd be helpful to have a small example where you describe this problem.
Let's imaging that you have a cluster with the total DNs number equals to any value of [4, 9] (4 or 5 or 6 or 7 or 8 or 9).
Then the maximum value of DN's that could be involved in balancing for that clusters will be 1
, because default value for maxDatanodesRatioToInvolvePerIteration is 0.2 (20 %). So in the next two methods will skip balancing when DNs count is less then 10.
// ContainerBalancerTask#adaptWhenNearingIterationLimits
int maxDatanodesToInvolve = config.getMaxDatanodesRatioToInvolvePerIteration() * totalNodesInCluster;
if (countDatanodesInvolvedPerIteration + 1 == maxDatanodesToInvolve) {
// Restricts potential target datanodes to nodes that have already been selected
}
// ContainerBalancerTask#adaptOnReachingIterationLimits
int maxDatanodesToInvolve = config.getMaxDatanodesRatioToInvolvePerIteration() * totalNodesInCluster;
if (countDatanodesInvolvedPerIteration == maxDatanodesToInvolve) {
// Restricts potential source and target datanodes to nodes that have already been selected
}
// 4 * 0.2 = (0.8 -> cast_to_int) 0;
// 5 * 0.2 = (1.0 -> cast_to_int) 1;
// 6 * 0.2 = (1.2 -> cast_to_int) 1;
// 7 * 0.2 = (1.4 -> cast_to_int) 1;
// 8 * 0.2 = (1.6 -> cast_to_int) 1;
// 9 * 0.2 = (1.8 -> cast_to_int) 1;
// 10 * 0.2 = (2.0 -> cast_to_int) 2;
By spec java primitive narrowing conversion, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3)
The Java programming language uses round toward zero when converting a floating value to an integer (§5.1.3), which acts, in this case, as though the number were truncated, discarding the mantissa bits. Rounding toward zero chooses as its result the format's value closest to and no greater in magnitude than the infinitely precise result.
So we got under-utilized nodes that will never take part in balancing at all. All clusters with DNs count > 3 and < 10 will start balancing and do nothing because of action in ContainerBalancerTask#adaptWhenNearingIterationLimits
method
Ah, I understand. Yes, I've seen this happen in some small clusters. The recommendation is to increase the value of datanodes.involved.max.percentage.per.iteration
accordingly. For example, it can be set to 100 for clusters of 15 Datanodes or less so that all Datanodes may be involved in balancing. Do you have any reason to not do this and make a code change instead? It doesn't make sense to have a configuration datanodes.involved.max.percentage.per.iteration
which imposes a limit, and then have another configuration adapt.balance.when.reach.the.limit
which will effectively disable the former limit. Why not just change datanodes.involved.max.percentage.per.iteration
?
Ok, make it sense.
Let me rewrite the tests to verify desired behavior by increase the value of datanodes.involved.max.percentage.per.iteration
. And I revert the changes about properties in hdds.scm.container.balancer.ContainerBalancerConfiguration
.
What do you think?
UPD: I've updated PR with using datanodes.involved.max.percentage.per.iteration
property in containerBalancerShouldObeyMaxDatanodesToInvolveLimit
test
@siddhantsangwan, what about current changes? Do you agree with them?
@siddhantsangwan, sorry about duplicated request: what about current changes? Do you agree with them?
Ok, I'll remove all the refactorings in ContainerBalancer
, ContainerBalancerTask
and TestContainerBalancerTask
, and I will have to add some duplicated code in a new classes MockedSCM
and TestableCluster
. In the next PR I'll refactor TestContainerBalancerTask
to remove duplicates and reuse MockedSCM
instances. Is it ok?
@Montura Is it possible to just add the new test using the existing testing framework in the current PR, while other improvements/refactoring can be done in future jiras?
@Montura Is it possible to just add the new test using the existing testing framework in the current PR, while other improvements/refactoring can be done in future jiras?
It can be done in a very ugly way. Because I need to run N testable clusters
with different number of DNs, so each cluster encapsulates ContainerBalancer
and ContainerBalancerTask
too. For now only the field ContainerBalancerTask
is sharing between the all the tests and cluster with hardcoded 10 DNs is initialized by default before each test.
Maybe I can create a new TestContainerBalancerTaskDatanodeNodeLimit
file , move some test code from TestContainerBalancerTask
and use a new framework?
In next PR I'll merge these two tests with a new test framework. Actually there is no new test framework. I just move mocking code
to a MockedSCM
and code related to cluster initialization
to a TestableCluster
. That is it.
@siddhantsangwan , take a look please
@adoroszlai, sure, I'll change it
@Montura Thanks for updating. Will try to review asap!
@siddhantsangwan, sorry for reminder, do you have any updates?
@siddhantsangwan, sorry for reminder once more, do you have any updates?
@siddhantsangwan, sorry for reminder once more, could you approve this changes?