insanely-fast-whisper icon indicating copy to clipboard operation
insanely-fast-whisper copied to clipboard

Getting `Use model.to('Cuda')` when trying to use Flash Attention

Open eburgwedel opened this issue 10 months ago • 4 comments

I installed all necessary drivers and packages, including nvcc to build Flash Attention, all smooth sailing. While everything works fine, when I try to use

insanely-fast-whisper --file-name audio.ogg --flash True

I get the following warning:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda')

However, the transcription still works, but - so I assume - without Flash.

I tried it on Ubuntu 22.04.4 LTS on 24 Cores 32GB + GTX4090 24GB and A100 80GB.

What did I miss?

PS: Maybe I should add that this is an awesome project. Thank you.

eburgwedel avatar Apr 18 '24 12:04 eburgwedel

@Vaibhavs10 CC

peterschmidt85 avatar May 19 '24 08:05 peterschmidt85

Got the same issue...

Natalie-Caruana avatar Aug 28 '24 08:08 Natalie-Caruana

I am also facing the same issue. Were any of you able to fix the issue? Thanks!

cyb3rh4wk avatar Sep 15 '24 23:09 cyb3rh4wk

insanely-fast-whisper --task transcribe --flash True --device-id 0 --file-name audio.mp3 You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').

How to initialize model on GPU?

bmili avatar Sep 20 '24 20:09 bmili