aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Support multiple Lora adapter replicas

Open Jeffwan opened this issue 1 year ago • 5 comments

🚀 Feature Description and Motivation

In the initial version, to simplify the the model adapter autoscaling, we determine to support only 1 replica in the CRD. Technically, we should support multiple replicas to allow higher throughput.

Use Case

In my production deployment, it need higher throughput and I want multiple lora to be deployed in the environments.

Proposed Solution

  1. Enable replicas in the lora crd
  2. Make sure the scheduling algorithm can correctly schedule the lora. We need to handle some special cases like num of loras <= num of pods. It's meaningless to support > 1 loras on single pod.
  3. (Optional) support lora autoscaling

Jeffwan avatar Sep 05 '24 14:09 Jeffwan

It's meaningless to support > 1 loras on single pod.

Quick q: did you mean support "< 1 loras on single pod"?

xieus avatar Sep 19 '24 17:09 xieus

@xieus this is a constraints on the scheduling. single lora model adapter can be scheduled to the pod no more than 1 replica. 2 replicas on single pod won't be helpful from the throughput perspective

Jeffwan avatar Sep 19 '24 18:09 Jeffwan

#205 becomes a large change and I notice there're some edge cases needs to cover. I will postpone this feature to rc3.

Jeffwan avatar Sep 24 '24 22:09 Jeffwan

It takes some time to refactor the current code base to improve the extensibility for such changes. I already move some refactor codes changes from #205 to #260 . This would be moved to v0.2.0

Jeffwan avatar Oct 02 '24 23:10 Jeffwan

move to later release due to limited times.

Jeffwan avatar Jan 15 '25 01:01 Jeffwan

This has been supported in https://github.com/vllm-project/aibrix/pull/1132

Jeffwan avatar Aug 30 '25 06:08 Jeffwan

I feel we need to change the design a little bit.

  1. Lora replicas introduce hierarchy level, it's hard to manage everything in model adapter layer. status.phase etc can only indicate single replica status but not all.
  2. Lora 1 or all will be much cleaner. This aligns with the #1132 original ideas to support multiple replicas

Once we have enough use cases, we can extend to the hierarchy design

Jeffwan avatar Oct 15 '25 00:10 Jeffwan