peer-os
peer-os copied to clipboard
Algorithm for RH limit checks
We need to improve the algorithm used for RH limit checks. Current implementation is too restrictive. For details on the current algo, see https://github.com/subutai-io/peer-os/wiki/RH-checking-algorithm
Yes, current algorithm is not flexible. For example, if Resource Host has weak CPU, but huge RAM and Disk, limit check will prevent using RAM and Disk fully, because CPU quota will be exhausted quickly by new containers. Same could be with RAM and Disk: Resource Host may have strong CPU but small RAM or Disk storage.
From CPU, RAM and Disk, most volatile resource is CPU, after that goes RAM. And Disk is resource which requires exact measuring. So, I think for the beginning, we should skip checking CPU limits. And for RAM, we could introduce some volatility factor, like 80%. I.e. if existing container quota is 4GB and it's historical consumption for last hour is 2GB, "limit check algorithm" should subtract 4*0,8 GB (not whole 4GB) from available resources.
Also, limit check values may vary depending on ratio of RAM to DISK and vice-a-versa. I.e. if Resource Host has small RAM but huge Disk, RAM volatility factor might be lower.
@lbthomsen @niclash your comments are requested
I think that the underlying issue is that CPU is considered "reserved" rather than measured, per container. And users can't select whether they want to "reserve" CPU or is ok to use "shared" CPU. If we treat the containers a "reserved", then utilization will probably look dismal as many containers will use very little CPU. Fixing this would need various changes, allowing users to (for a fee) reserve CPU, rather than "get some CPU", and in the current situation, such change is rather big, and quickly comes into the full resource management system that should be in place for RH owners and Container deployers.
So, in the short term, I recommend; a. RAM is reserved by container, and not by demand at all. It is the primary "selector" of container (tiny SIZE, not tiny SPEED) b. CPU load limits per container already exists and can remain as per "RH-checking-algorithm" page. c. Initially only allocate containers to available "CPU capacity", i.e. 100% * NoOfCPUs. d. Measure each containers usage, and if over a "long" period, say a week, add 80% of the "not used" part to the "CPU capacity" of the RH.
a. The algo considers RAM as reserved already, so done
b. then it is done
c. now we calculate as availableCpu == numberOfCores * idleCpu
d. ok I can increase historical metric to 1 week instead of 2 hrs
what about the other resources, disk? do we use them in calculations? @niclash