vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[ROCm] Add support for Punica kernels on AMD GPUs

Open kliuae opened this issue 11 months ago • 7 comments

This PR adds ROCm support for punica kernels to enable multi-LoRA on AMD GPUs. Some Punica files are slightly refactored so that the correct c++/hipcc compilers can be invoked when building under ROCm. A custom bgmv shrink kernel is added to account for the difference in warp size between AMD's GPUs and Nvidia's. The port has been tested on MI210, and the unit tests applying LoRA are passing.

kliuae avatar Mar 01 '24 09:03 kliuae

@hongxiayang @lcskrishna Could you help review this PR?

WoosukKwon avatar Mar 04 '24 18:03 WoosukKwon

@hongxiayang @dllehr-amd Could you review this PR? This is an important PR that enables the AMD GPUs to support multi-LoRA serving, which is a key feature in vLLM liked by many users.

WoosukKwon avatar Mar 13 '24 06:03 WoosukKwon

This script can help verify this works end to end https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py

simon-mo avatar Mar 28 '24 17:03 simon-mo

will check. Thanks for this effort.

hongxiayang avatar Mar 29 '24 22:03 hongxiayang

I was going to try this out soon, is this in a good spot or is it still being worked?

jamestwhedbee avatar Apr 23 '24 16:04 jamestwhedbee

I was going to try this out soon, is this in a good spot or is it still being worked?

It's in a good state for testing, though occasionally I'll be merging in the upstream to fix conflicts before it gets merged.

kliuae avatar Apr 25 '24 06:04 kliuae

@kliuae Sorry for the late review. The PR looks good. Could you please resolved the merge conflict in CMakeLists.txt so that I can merge it? Thanks!

@kliuae Please resolve the latest merge conflict. Your PR is instrumental for our ongoing effort. Thank you very much!

Alexei-V-Ivanov-AMD avatar May 02 '24 23:05 Alexei-V-Ivanov-AMD

@WoosukKwon Merge conflicts are resolved

kliuae avatar May 08 '24 09:05 kliuae