Saeed

Results 31 comments of Saeed

@hongxiayang , even though that page says MI50 support is deprecated, will AMD keep support for our cards in near future? I hope so. There are tens of thousands of...

@mgoin can you please check this PR. I fixed the merge conflict but some tests are failing. Those tests are not related to this PR as far as I can...

@mgoin , after some experimentation, I was able to add my DCO and fix the pre-commit error. I think this should be ready to go.

@hongxiayang please, review this PR when you have time. Thanks!

@mgoin can you please help me remove other reviewers? I previously made rebasing of this PR which was not correct and it assigned 14 more reviewers. Originally, this PR was...

Hello @skyne98, I do not have experience building packages using docker. It looks like vllm is trying to use hipblast but hipblast does not support MI50/60. Did you build hipblast...

Hello @mgoin , It looks like there has been some changes in the repo and I no longer see GPU_ARCH variable and use_rocm_custom_paged_attention function. Did they move it to some...

@skyne98, Which model did you run in vllm? I recommend GPTQ autoround 4 bit quantization for MI50/60. Try out this quantization of llama3 70B: https://huggingface.co/kaitchup/Llama-3.3-70B-Instruct-AutoRound-GPTQ-4bit. I got around 20t/s for...

@skyne98 , I think fp16 inference in vllm is not optimized for MI50/60. With llama3 8b fp16, I was getting ~14t/s using 2xMI60. Also, I recommend you to use "--dtype...

@skyne98 , what version of vllm are you using? I had success running QwQ-32B-Preview-GPTQ-4bit at 35 t/s with 2xMI60. If your vllm version is older, it may not support QwQ...