Jingyuan comments

Results 15 comments of


                                            Jingyuan

New options

Well, "incrementalMode" is what I need, that saves a lot of time for large directories.

GPU optimiser replicas not scaling

Aibrix currently disables workload monitoring by default at the Gateway Plugin. Without workload monitoring, the GPU optimizer can not know the workload characteristics. To enable the workload monitoring, configure the...

GPU optimiser replicas not scaling

BTW, the minimum solution the optimizer gave out is based on a label in the deployment configuration: "model.aibrix.ai/min_replicas", which specifies the minimum replica configuration in heterogeneous/multiple GPU deployments if there...

GPU optimiser replicas not scaling

I find two log entries in one round of optimization (within a 10s optimization interval), suggesting that you have two models running concurrently. I think there is no workload for...

GPU optimiser replicas not scaling

These logs do not seem to be consistent with previous logs. These logs show that the profile is not applied, so cost is reported as $inf.

GPU optimiser replicas not scaling

Well, can you enable the -debug option for the gpu-optimizer by using the following command: ``` kubectl delete -k config/overlays/dev/gpu-optimizer kubectl apply -k config/overlays/dev/gpu-optimizer ``` And show me the component...

Gpu optimizer write deployment replica suggestion and autoscaler go through the calculation again

In podautoscaler settings. The targetValue is set to "1". So KPA will scale in times of integer the gpu_optimzer suggests. However, we currently depend on the KPA algorithm to stabilize...

Consider to support delay scheduling in Gateway

In fact. My SLO-based routing policy introduced a new abstraction called QueueRouter, which allows reordering requests based on the queue type. The QueueRouter is a framework-level router that bridges between...

Consider to support delay scheduling in Gateway

> If other routing policies need delay scheduling as a common capability to leverage, is it possible? Or it only works with SLO-based routing policy? I am trying to understand...

batch api doesn't work as expected.

@Jeffwan The worker YAML(config/metadata/job_template_patch.yaml) must be customized to use a customized image, or it will use the aibrix/mock only.