insanely-fast-whisper
insanely-fast-whisper copied to clipboard
Getting `Use model.to('Cuda')` when trying to use Flash Attention
I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use
insanely-fast-whisper --file-name audio.ogg --flash True
I get the following warning:
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')
However, the transcription still works, but - so I assume - without Flash.
I tried it on Ubuntu 22.04.4 LTS on 24 Cores 32GB + GTX4090 24GB and A100 80GB.
What did I miss?
PS: Maybe I should add that this is an awesome project. Thank you.
@Vaibhavs10 CC
Got the same issue...
I am also facing the same issue. Were any of you able to fix the issue? Thanks!
insanely-fast-whisper --task transcribe --flash True --device-id 0 --file-name audio.mp3
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')
.
How to initialize model on GPU?