vllm [Usage]: Dev instructions for implementing new LoRA features on VLLM

[Usage]: Dev instructions for implementing new LoRA features on VLLM

Open AaronZLT opened this issue 8 months ago • 1 comments

What is the easiest way, when implementing new LoRA features on VLLM?

For example, I want to modify the forward pass of LoRA model, ref to the forward pass in PEFT, the original LoRA is:

result = result + lora_B(lora_A(dropout(x))) * scaling

While suppose new feature is:

result = result + lora_B(lora_A(dropout(x))) * scaling + some_linear_A @ some_linear_B + some_func()

then

do I need to modify all the RowParallelLinearWithLoRA, MergedQKVParallelLinearWithLoRA, ColumnParallelLinearWithLoRA, etc. ?
consider tp?
torch or triton or both?
punica?

Looking for a easiest way to implement when only the forward pass of LoRA is changed.

Apr 11 '25 10:04 AaronZLT

What are the new lora features?

Apr 12 '25 04:04 jeejeelee

We want to utilize multi lora(inside the same model), and calculate the sum of $XA_iB_i$, quite simple case.

Apr 13 '25 11:04 AaronZLT