Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Enable `apply` to replace `create` in manifest deployment

## --server-side ### dependency ``` error: Apply failed with 1 conflict: conflict with "kubectl-client-side-apply" using apps/v1: .spec.template.spec.containers[name="envoy-gateway"].resources.limits.memory Please review the fields above--they currently have other managers. Here are the ways...

Enable `apply` to replace `create` in manifest deployment

@andyluo7 due to some dependency issues, it's not easy to replace to `apply` that easily, we will talk with maintainers or replace to our own distribution later. Please stick to...

RTX 5090 D 顯卡兼容性問題：CUDA 錯誤導致服務崩潰

@kerthcet Currently, the gateway cache and the inference cache are two separate cache systems. This separation means they can get out of sync. We have contemplated synchronizing the engine and...

GPU optimiser replicas not scaling

/cc @zhangjyr please help take a look.

[Misc] Support adapter scaling to all replicas

@dittops Great! I will spend some time this week to review this change

[Misc] Support adapter scaling to all replicas

@dittops I think the only part I was not that sure is the scheduling part. can you give more details?

[Misc] Support adapter scaling to all replicas

@dittops the workflow sounds good. from the change change, I notice the lora scheduling logic has been deleted. In this case, how to select pods? ![image](https://github.com/user-attachments/assets/faf37ddc-5c70-4f33-a3a1-dd42e79d1c74)

[Misc] Support adapter scaling to all replicas

@dittops Yeah, I think the behavior has changed a bit recently. Option 1: Schedule the LoRA model to specific pods based on the specified replicas. Option 2: Load the LoRA...

[Misc] Support adapter scaling to all replicas

@dittops exactly. https://github.com/vllm-project/aibrix/blame/main/api/model/v1alpha1/modeladapter_types.go#L53

[Misc] Support adapter scaling to all replicas

@dittops apologies for late response. I am recently refactoring lora work to provide better production level support. I want to merge this one first before I refactor the codes. However...