vllm
vllm copied to clipboard
[Usage]: Dev instructions for implementing new LoRA features on VLLM
What is the easiest way, when implementing new LoRA features on VLLM?
For example, I want to modify the forward pass of LoRA model, ref to the forward pass in PEFT, the original LoRA is:
result = result + lora_B(lora_A(dropout(x))) * scaling
While suppose new feature is:
result = result + lora_B(lora_A(dropout(x))) * scaling + some_linear_A @ some_linear_B + some_func()
then
- do I need to modify all the RowParallelLinearWithLoRA, MergedQKVParallelLinearWithLoRA, ColumnParallelLinearWithLoRA, etc. ?
- consider tp?
- torch or triton or both?
- punica?
Looking for a easiest way to implement when only the forward pass of LoRA is changed.
What are the new lora features?
We want to utilize multi lora(inside the same model), and calculate the sum of $XA_iB_i$, quite simple case.