modelmesh
modelmesh copied to clipboard
Models can not be scaled up to 2 copies
Context
I am learning how the auto scaling of the model mesh works. I found this piece of docs:
Models will scale to two copies if they have been used recently regardless of the load - the autoscaling behaviour applies between 2 and N>2 copies.
It is so vague that I have to dive in the source code, and then I found this code:
// For 1->2 copies, scale-up can also be triggered by a pattern of recent usage
// See explanation of CacheEntry#usageSlices
if (loadedCount == 1) {
// assert mr.getInstanceIds().containsKey(instanceId);
int i1 = ce.earlierUseIteration, i2 = ce.lastUsedIteration;
// invariants: lower < upper, i1 <= i2
if (logger.isDebugEnabled()) {
logger.debug("Second copy trigger evaluation for model " + modelId
+ ": target range [" + lower + ", " + upper + "], I1="
+ i1 + ", I2=" + i2 + ", curIteration=" + iterationCounter);
}
boolean i1inRange = false, i2inRange = false;
if (i2 >= lower && i1 <= upper) {
i1inRange = i1 >= lower;
i2inRange = i2 <= upper;
}
if (i2inRange || !i1inRange) {
ce.earlierUseIteration = i2;
}
ce.lastUsedIteration = iterationCounter;
if (i1inRange || i2inRange) {
// Model was used within the target range [MIN_AGE, MAX_AGE] iterations ago
// so trigger loading of a second copy
// Don't do it if > 90% full and cache is younger than secondCopyLruThresholdMillis
if ((10 * clusterStats.totalFree) / clusterStats.totalCapacity >= 1
|| (now - clusterStats.globalLru) > secondCopyLruThresholdMillis) {
logger.info("Attempting to add second copy of model " + modelId
+ " in another instance since \"regular\" usage was detected");
ensureLoadedInternalAsync(modelId, lastTime, ce.getWeight(), excludeThisInstance, 0);
continue;
}
}
}
As far as I understand, the logic if a model is recently used and there is a prior usage of it falling into the interval of 40 minutes and 7 minutes before the correspond time, the model should be scaled to 2 copies.
Current behaviour
If a model is consistently used, the earlierUseIteration and lastUsedIteration will be updated continuously, to the last check time. That logic is indicated in these lines of code:
if (i2inRange || !i1inRange) {
ce.earlierUseIteration = i2;
}
ce.lastUsedIteration = iterationCounter;
i2inRange and i1inRange will never be true, since both i1 i2 will always be updated to the most recently point of time and consequently exceed the upper point. Therefore a model has to be used once, wait for 7 minutes without receiving any requests in order to be scaled to 2 copies ( I have tested that behaviour). Having to wait for 7 minutes
Expectation
If a model is being used consistently for over 7 minutes, that model should be scaled to 2 copies.
Suggestion
Perhaps the point of time that the oldest request that not exceed 40 minutes should be recorded instead of the earlierUseIteration. The remaining logic is the same.
Hi @haiminh2001, here is the commit with more details about the motivation, not sure if you already see it, if not, it can help understand it better. https://github.com/kserve/modelmesh/commit/2790ef244b0cfdf6b83bf98894c2eff0e8ea0805#diff-1955f855ee9b6cf2eed89d74caca0b4a5181d856c645c4247561ac6f140aace4R503
Hi @spolti , I have read the commit before, but in short, my question is: Why using a model continuously would not trigger the second copy, but I have to use the model, wait for 7 minutes without any usage and then finally use the model again? That does not make sense to me.
Piggy-back on the existing frequent rate-tracking task to keep track of "iteration numbers" in which single-copy models are loaded, and only trigger a second copy when there's a prior usage more than 7 minutes but less than 40 minutes ago. If the usage is confined to a < 7min window it could be isolated; if > 40min apart the value of a second copy is lower (probability of pod death causing disruption is minimal).
And in the commit said: there's a prior usage more than 7 minutes but the code logic means the exact last usage, not any prior usage, that has to be in the interval of 7 and 40 minutes ago.
Hi @spolti, I fixed the logic and opened a PR. Can you have a look ? Thank you in advanced.