VILA
VILA copied to clipboard
Inference on older microarchitectures-issue with FlashAttention
Hello everyone, thanks for this amazing work!
I'm trying to run inference using NVILA-8B model on NVIDIA V100 GPU but facing issue. I understand from the model requirements that NVILA supports certain microarchitectures only, but running inference on this NVIDIA V100 GPU is a strict requirement for me.
Any solution to run inference on NVIDIA V100 would be of great help, so thanks in advance!!!
Getting this: RuntimeError: FlashAttention only supports Ampere GPUs or newer.
We do not have V100 hardware to test. You may try to replace FlashAttention with other implementations.
Thanks for the quick and helpful response. Can you please help me finding in which code file can I find FlashAttention so that I can replace it with earlier implementations? Thanks in advance.