Incorrect handling of reserved resources
Problem Description Fenzo misinterprets offers containing a mix of reserved and unreserved resources, causing it to fail to consider all offered resources. For example, given an offer of 2 reserved CPUs and 3 unreserved CPUs, Fenzo behaves as though the offer contains 2 (or 3) CPUs, not 5 CPUs as it should.
This situation arises when the operator (or another framework in the same role) reserves a subset of a host for the framework's role. This is an increasingly common phenomenon due to:
- the dynamic reservation feature, which makes it easy for an operator to make fine-grained reservations.
- the growing popularity of the dcos-commons library, which makes extensive use of dynamic reservations. A framework based on that library may use the same role as a Fenzo-based framework, leading to unintended side-effects.
Here's an example depicting the resources within such an offer (2 cpus for myrole, 3 unreserved):
cpus(myrole):2.0; mem(myrole):4096.0; ports(myrole):[1025-2180];
disk(*):28829.0; cpus(*):3.0; mem(*):10766.0; ports(*):[2182-3887,8082-8180,8182-32000]
Problem Location
The root cause is within com.netflix.fenzo.plugins.VMLeaseObject. The VMLeaseObject assumes that a given resource name (e.g. cpus) will appear at most once in the offer.
Suggested fix
VMLeaseObject should aggregate all resources with the same name (subject to a set of roles to filter on).
A suggested workaround is for the framework to use an alternate implementation of com.netflix.fenzo.VirtualMachineLease. See example here.
Just a comment to framework developers: to effectively use reserved resources, be sure to formulate the resource objects in your TaskInfo based on the specific resources contained in the offer. If you're accepting an offer like shown above, your TaskInfo should contain numerous cpus resources (e.g.
[cpus(myrole):2.0; cpus(*):3.0; ...]). See examples here and here.
oh @EronWright this is a really good idea ! thanks