aibrix Support multiple Lora adapter replicas

🚀 Feature Description and Motivation

In the initial version, to simplify the the model adapter autoscaling, we determine to support only 1 replica in the CRD. Technically, we should support multiple replicas to allow higher throughput.

Use Case

In my production deployment, it need higher throughput and I want multiple lora to be deployed in the environments.

Proposed Solution

Enable replicas in the lora crd
Make sure the scheduling algorithm can correctly schedule the lora. We need to handle some special cases like num of loras <= num of pods. It's meaningless to support > 1 loras on single pod.
(Optional) support lora autoscaling

Sep 05 '24 14:09 Jeffwan

It's meaningless to support > 1 loras on single pod.

Quick q: did you mean support "< 1 loras on single pod"?

Sep 19 '24 17:09 xieus

@xieus this is a constraints on the scheduling. single lora model adapter can be scheduled to the pod no more than 1 replica. 2 replicas on single pod won't be helpful from the throughput perspective

Sep 19 '24 18:09 Jeffwan

#205 becomes a large change and I notice there're some edge cases needs to cover. I will postpone this feature to rc3.

Sep 24 '24 22:09 Jeffwan

It takes some time to refactor the current code base to improve the extensibility for such changes. I already move some refactor codes changes from #205 to #260 . This would be moved to v0.2.0

Oct 02 '24 23:10 Jeffwan

move to later release due to limited times.

Jan 15 '25 01:01 Jeffwan

This has been supported in https://github.com/vllm-project/aibrix/pull/1132

Aug 30 '25 06:08 Jeffwan

I feel we need to change the design a little bit.

Lora replicas introduce hierarchy level, it's hard to manage everything in model adapter layer. status.phase etc can only indicate single replica status but not all.
Lora 1 or all will be much cleaner. This aligns with the #1132 original ideas to support multiple replicas

Once we have enough use cases, we can extend to the hierarchy design

Oct 15 '25 00:10 Jeffwan