punica
punica copied to clipboard
Serving multiple LoRA finetuned LLM as one
should add threadIdx.x == 0, when you want to write y_warpsize. Otherwise it will lead the wrong answer.
Any plans to add support for SM75 like V100 GPUs? Thank you!
Hi! I tried using the benchmark text generation `python -m benchmarks.bench_textgen_lora --system punica --batch-size 32` but when I did I got a runtime error stating the output should be a...
> Assuming W of shape [H1, H2] is the weight of the pretrained model, LoRA adds two small matrices A of shape [H1, r] and B of [r, H2]. Running...
When I set `TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0"`, I got compiling errors. And then I found: https://github.com/punica-ai/punica/blob/591b59899f0a20760821785d06b331c8a2e5cb86/.github/workflows/release_wheel.yml#L15 Is there something we does not support yet? Thank you in advance! Update: Adding...
Thanks!
I'm not able to install this library on Colab. I tried this ```bash git clone https://github.com/punica-ai/punica cd punica && pip install . ``` But this is failing with the following...
I wanted to know how to use Multi-GPUs and Multi-Node solutions with the current Punica code. Also wanted to know about the runner and scheduler code which is mentioned in...
Hi, Congratulations on the great work you have done! I am very interested in your work. Specifically, I want to know how you allow multiple serving processes to share the...
:robot: I have created a release *beep* *boop* --- ## [1.1.1](https://github.com/punica-ai/punica/compare/v1.1.0...v1.1.1) (2024-01-09) ### Bug Fixes * **sgmv:** deadlock in sgmv_shrink kernel caused by skewed segments ([#35](https://github.com/punica-ai/punica/issues/35)) ([591b598](https://github.com/punica-ai/punica/commit/591b59899f0a20760821785d06b331c8a2e5cb86)) --- This PR...