modelmesh
modelmesh copied to clipboard
Models can not be scaled up to 2 copies
Context
I am learning how the auto scaling of the model mesh works. I found this piece of docs:
Models will scale to two copies if they have been used recently regardless of the load - the autoscaling behaviour applies between 2 and N>2 copies.
It is so vague that I have to dive in the source code, and then I found this code:
// For 1->2 copies, scale-up can also be triggered by a pattern of recent usage
// See explanation of CacheEntry#usageSlices
if (loadedCount == 1) {
// assert mr.getInstanceIds().containsKey(instanceId);
int i1 = ce.earlierUseIteration, i2 = ce.lastUsedIteration;
// invariants: lower < upper, i1 <= i2
if (logger.isDebugEnabled()) {
logger.debug("Second copy trigger evaluation for model " + modelId
+ ": target range [" + lower + ", " + upper + "], I1="
+ i1 + ", I2=" + i2 + ", curIteration=" + iterationCounter);
}
boolean i1inRange = false, i2inRange = false;
if (i2 >= lower && i1 <= upper) {
i1inRange = i1 >= lower;
i2inRange = i2 <= upper;
}
if (i2inRange || !i1inRange) {
ce.earlierUseIteration = i2;
}
ce.lastUsedIteration = iterationCounter;
if (i1inRange || i2inRange) {
// Model was used within the target range [MIN_AGE, MAX_AGE] iterations ago
// so trigger loading of a second copy
// Don't do it if > 90% full and cache is younger than secondCopyLruThresholdMillis
if ((10 * clusterStats.totalFree) / clusterStats.totalCapacity >= 1
|| (now - clusterStats.globalLru) > secondCopyLruThresholdMillis) {
logger.info("Attempting to add second copy of model " + modelId
+ " in another instance since \"regular\" usage was detected");
ensureLoadedInternalAsync(modelId, lastTime, ce.getWeight(), excludeThisInstance, 0);
continue;
}
}
}
As far as I understand, the logic if a model is recently used and there is a prior usage of it falling into the interval of 40 minutes and 7 minutes before the correspond time, the model should be scaled to 2 copies.
Current behaviour
If a model is consistently used, the earlierUseIteration and lastUsedIteration will be updated continuously, to the last check time. That logic is indicated in these lines of code:
if (i2inRange || !i1inRange) {
ce.earlierUseIteration = i2;
}
ce.lastUsedIteration = iterationCounter;
i2inRange and i1inRange will never be true, since both i1 i2 will always be updated to the most recently point of time and consequently exceed the upper point. Therefore a model has to be used once, wait for 7 minutes without receiving any requests in order to be scaled to 2 copies ( I have tested that behaviour). Having to wait for 7 minutes
Expectation
If a model is being used consistently for over 7 minutes, that model should be scaled to 2 copies.
Suggestion
Perhaps the point of time that the oldest request that not exceed 40 minutes should be recorded instead of the earlierUseIteration. The remaining logic is the same.