insanely-fast-whisper Getting `Use model.to('Cuda')` when trying to use Flash Attention

Getting `Use model.to('Cuda')` when trying to use Flash Attention

Open eburgwedel opened this issue 10 months ago • 4 comments

I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use

insanely-fast-whisper --file-name audio.ogg --flash True

I get the following warning:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')

However, the transcription still works, but - so I assume - without Flash.

I tried it on Ubuntu 22.04.4 LTS on 24 Cores 32GB + GTX4090 24GB and A100 80GB.

What did I miss?

PS: Maybe I should add that this is an awesome project. Thank you.

Apr 18 '24 12:04 eburgwedel

@Vaibhavs10 CC

May 19 '24 08:05 peterschmidt85

Got the same issue...

Aug 28 '24 08:08 Natalie-Caruana

I am also facing the same issue. Were any of you able to fix the issue? Thanks!

Sep 15 '24 23:09 cyb3rh4wk

insanely-fast-whisper --task transcribe --flash True --device-id 0 --file-name audio.mp3 You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').

How to initialize model on GPU?

Sep 20 '24 20:09 bmili

insanely-fast-whisper insanely-fast-whisper copied to clipboard

Getting `Use model.to('Cuda')` when trying to use Flash Attention

insanely-fast-whisper
insanely-fast-whisper copied to clipboard