punica icon indicating copy to clipboard operation
punica copied to clipboard

Serving multiple LoRA finetuned LLM as one

Results 18 punica issues
Sort by recently updated
recently updated
newest added

should add threadIdx.x == 0, when you want to write y_warpsize. Otherwise it will lead the wrong answer.

Any plans to add support for SM75 like V100 GPUs? Thank you!

Hi! I tried using the benchmark text generation `python -m benchmarks.bench_textgen_lora --system punica --batch-size 32` but when I did I got a runtime error stating the output should be a...

> Assuming W of shape [H1, H2] is the weight of the pretrained model, LoRA adds two small matrices A of shape [H1, r] and B of [r, H2]. Running...

When I set `TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0"`, I got compiling errors. And then I found: https://github.com/punica-ai/punica/blob/591b59899f0a20760821785d06b331c8a2e5cb86/.github/workflows/release_wheel.yml#L15 Is there something we does not support yet? Thank you in advance! Update: Adding...

I'm not able to install this library on Colab. I tried this ```bash git clone https://github.com/punica-ai/punica cd punica && pip install . ``` But this is failing with the following...

I wanted to know how to use Multi-GPUs and Multi-Node solutions with the current Punica code. Also wanted to know about the runner and scheduler code which is mentioned in...

Hi, Congratulations on the great work you have done! I am very interested in your work. Specifically, I want to know how you allow multiple serving processes to share the...

:robot: I have created a release *beep* *boop* --- ## [1.1.1](https://github.com/punica-ai/punica/compare/v1.1.0...v1.1.1) (2024-01-09) ### Bug Fixes * **sgmv:** deadlock in sgmv_shrink kernel caused by skewed segments ([#35](https://github.com/punica-ai/punica/issues/35)) ([591b598](https://github.com/punica-ai/punica/commit/591b59899f0a20760821785d06b331c8a2e5cb86)) --- This PR...

autorelease: pending