vllm
vllm copied to clipboard
[Feature]: FlashInfer + Gemma 2 for AMD GPU
This could be question rather than a feature request.
flashinfer is not supported for AMD GPUs and it's not currently planned until a later version,
Is there a way to run Gemma 2 models on AMD (I'm getting ValueError: Please use Flashinfer backend for models withlogits_soft_cap (i.e., Gemma-2). Otherwise, the output might be wrong. Set Flashinfer backend by export VLLM_ATTENTION_BACKEND=FLASHINFER. even though I set the env var. I wanted to give it a try and remove the validation but it fails with None type because of import error) ?
Or is there an alternative that can be used?
Alternatives
No response
Additional context
No response