vllm [ROCm] Add support for Punica kernels on AMD GPUs

[ROCm] Add support for Punica kernels on AMD GPUs

Open kliuae opened this issue 11 months ago • 7 comments

This PR adds ROCm support for punica kernels to enable multi-LoRA on AMD GPUs. Some Punica files are slightly refactored so that the correct c++/hipcc compilers can be invoked when building under ROCm. A custom bgmv shrink kernel is added to account for the difference in warp size between AMD's GPUs and Nvidia's. The port has been tested on MI210, and the unit tests applying LoRA are passing.

Mar 01 '24 09:03 kliuae

@hongxiayang @lcskrishna Could you help review this PR?

Mar 04 '24 18:03 WoosukKwon

@hongxiayang @dllehr-amd Could you review this PR? This is an important PR that enables the AMD GPUs to support multi-LoRA serving, which is a key feature in vLLM liked by many users.

Mar 13 '24 06:03 WoosukKwon

This script can help verify this works end to end https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py

Mar 28 '24 17:03 simon-mo

will check. Thanks for this effort.

Mar 29 '24 22:03 hongxiayang

I was going to try this out soon, is this in a good spot or is it still being worked?

Apr 23 '24 16:04 jamestwhedbee

I was going to try this out soon, is this in a good spot or is it still being worked?

It's in a good state for testing, though occasionally I'll be merging in the upstream to fix conflicts before it gets merged.

Apr 25 '24 06:04 kliuae

@kliuae Sorry for the late review. The PR looks good. Could you please resolved the merge conflict in CMakeLists.txt so that I can merge it? Thanks!

@kliuae Please resolve the latest merge conflict. Your PR is instrumental for our ongoing effort. Thank you very much!

May 02 '24 23:05 Alexei-V-Ivanov-AMD

@WoosukKwon Merge conflicts are resolved

May 08 '24 09:05 kliuae

vllm vllm copied to clipboard

[ROCm] Add support for Punica kernels on AMD GPUs

vllm
vllm copied to clipboard