[Feature]: FlashInfer + Gemma 2 for AMD GPU

Open erkintelnyx opened this issue 1 year ago • 4 comments

This could be question rather than a feature request.

flashinfer is not supported for AMD GPUs and it's not currently planned until a later version,

Is there a way to run Gemma 2 models on AMD (I'm getting ValueError: Please use Flashinfer backend for models withlogits_soft_cap (i.e., Gemma-2). Otherwise, the output might be wrong. Set Flashinfer backend by export VLLM_ATTENTION_BACKEND=FLASHINFER. even though I set the env var. I wanted to give it a try and remove the validation but it fails with None type because of import error) ?

Or is there an alternative that can be used?

Alternatives

No response

Additional context

No response

Jul 08 '24 17:07 erkintelnyx