Jingyuan
Jingyuan
Well, "incrementalMode" is what I need, that saves a lot of time for large directories.
Aibrix currently disables workload monitoring by default at the Gateway Plugin. Without workload monitoring, the GPU optimizer can not know the workload characteristics. To enable the workload monitoring, configure the...
BTW, the minimum solution the optimizer gave out is based on a label in the deployment configuration: "model.aibrix.ai/min_replicas", which specifies the minimum replica configuration in heterogeneous/multiple GPU deployments if there...
I find two log entries in one round of optimization (within a 10s optimization interval), suggesting that you have two models running concurrently. I think there is no workload for...
These logs do not seem to be consistent with previous logs. These logs show that the profile is not applied, so cost is reported as $inf.
Well, can you enable the -debug option for the gpu-optimizer by using the following command: ``` kubectl delete -k config/overlays/dev/gpu-optimizer kubectl apply -k config/overlays/dev/gpu-optimizer ``` And show me the component...
In podautoscaler settings. The targetValue is set to "1". So KPA will scale in times of integer the gpu_optimzer suggests. However, we currently depend on the KPA algorithm to stabilize...
In fact. My SLO-based routing policy introduced a new abstraction called QueueRouter, which allows reordering requests based on the queue type. The QueueRouter is a framework-level router that bridges between...
> If other routing policies need delay scheduling as a common capability to leverage, is it possible? Or it only works with SLO-based routing policy? I am trying to understand...
@Jeffwan The worker YAML(config/metadata/job_template_patch.yaml) must be customized to use a customized image, or it will use the aibrix/mock only.