Jiaxin Shan
Jiaxin Shan
vLLM side ``` curl -X POST http://localhost:8000/v1/load_lora_adapter \ -H "Content-Type: application/json" \ -d '{"lora_name": "text2sql-lora-1", "lora_path": "bharati2324/Qwen2.5-1.5B-Instruct-Code-LoRA-r16v2"}' curl -X POST http://localhost:8000/v1/unload_lora_adapter \ -H "Content-Type: application/json" \ -d '{"lora_name": "text2sql-lora-1"}' ```...
## Testing 1. update the controller manager settings ``` - --enable-runtime-sidecar ``` 2. rebuild controller-manager and runtime
This can not be closed even with #580.. We didn't handle the orchestration like model download + model registration. Currently, it's still single step.
absolute path has been supported, then we can mount pvc now. I will postpone the artifact download part to future release
Seem it's related to my misconfiguration. I didn't fully clean up the volumeMounts which cause the error. 
> vllm pods which do not have kvcache pods running in the same node crashes. If the node with engine pods doesn't have cache pod, engine pod will crash. affinity...
@kdtmac Phase I and Phase II will be done in v0.4.0. Currently, we'd like to reuse `NixlConnector` directly and probably work on MultiConnector to support both AIBrixKVOffloading + NixlConnector. For...
@ying2025 the router will select the right p/d pair for communication, we do have some algorithms can do better jobs than default vllm/sglang setting. framework support focus more on the...
@ying2025 sorry for late, it makes sense, prefix cache is considered in the routing decision to reduce TTFT
there're some remaining works, autoscaling/podgroup and some other community offerings will be supported in future release. The current P/D orchestration and routing should be good for v0.4.0 release. We can...