punica icon indicating copy to clipboard operation
punica copied to clipboard

Serving multiple LoRA finetuned LLM as one

Results 18 punica issues
Sort by recently updated
recently updated
newest added

Everything go well when I install punica from binary package. However, it shows "ImportError: cannot import name 'BatchedKvCache' from 'punica'" when I run "python -m benchmarks.bench_textgen_lora --system punica --batch-size 32"....

I'm running the following code and find the answer goes wrong. I initialize the `x` and `w` to be all ones. So the output `y` value should be `h1=4096`. But...

Thank you for your great work! May I ask about some details on the scheduler? 1. In paper, it is mentioned that "To minimize latency penalty, we limit the prefill...

Hello! Thank you for this awesome work. I am testing `Punica` for serving my custom models and it has GPT-NEOX model as the base model. Currently, does `Punica` support other...

Hey folks, awesome and really impactful work with the repo and the paper. I was wondering what was the reason for switching from the original `bgmv` kernel to a CUTLASS-based...

My environment is ``` PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Arch Linux (x86_64) GCC version: (conda-forge...

hello, I really want to try the custom expand kernel instead of the cutlass version. I want to know has the kernel been pushed out?

hi,I would like to know if the Punica can support different Lora adapters, such as Lora of different ranks or different devices,thanks and hope for your response.