Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Support different lora adapter artifact registry

vLLM side ``` curl -X POST http://localhost:8000/v1/load_lora_adapter \ -H "Content-Type: application/json" \ -d '{"lora_name": "text2sql-lora-1", "lora_path": "bharati2324/Qwen2.5-1.5B-Instruct-Code-LoRA-r16v2"}' curl -X POST http://localhost:8000/v1/unload_lora_adapter \ -H "Content-Type: application/json" \ -d '{"lora_name": "text2sql-lora-1"}' ```...

Support different lora adapter artifact registry

## Testing 1. update the controller manager settings ``` - --enable-runtime-sidecar ``` 2. rebuild controller-manager and runtime

Support different lora adapter artifact registry

This can not be closed even with #580.. We didn't handle the orchestration like model download + model registration. Currently, it's still single step.

Support different lora adapter artifact registry

absolute path has been supported, then we can mount pvc now. I will postpone the artifact download part to future release

RayClusterFleet controllers shows some reconcilation issues

Seem it's related to my misconfiguration. I didn't fully clean up the volumeMounts which cause the error. ![Image](https://github.com/user-attachments/assets/12f938c0-ef4e-4c4f-8261-13a8bc2a0ddf)

[Dist KV] vllm pods which do not have kvcache pods running in the same node crashes.

> vllm pods which do not have kvcache pods running in the same node crashes. If the node with engine pods doesn't have cache pod, engine pod will crash. affinity...

[RFC]: Add Support for Prefill/Decode (P/D) Disaggregation in vLLM

@kdtmac Phase I and Phase II will be done in v0.4.0. Currently, we'd like to reuse `NixlConnector` directly and probably work on MultiConnector to support both AIBrixKVOffloading + NixlConnector. For...

[RFC]: Add Support for Prefill/Decode (P/D) Disaggregation in vLLM

@ying2025 the router will select the right p/d pair for communication, we do have some algorithms can do better jobs than default vllm/sglang setting. framework support focus more on the...

[RFC]: Add Support for Prefill/Decode (P/D) Disaggregation in vLLM

@ying2025 sorry for late, it makes sense, prefix cache is considered in the routing decision to reduce TTFT

[RFC]: Add Support for Prefill/Decode (P/D) Disaggregation in vLLM

there're some remaining works, autoscaling/podgroup and some other community offerings will be supported in future release. The current P/D orchestration and routing should be good for v0.4.0 release. We can...