whisper-jax icon indicating copy to clipboard operation
whisper-jax copied to clipboard

New library possibly faster than Jax or just a hoax?

Open BBC-Esq opened this issue 1 year ago • 0 comments

Has anyone seen this repository? https://github.com/Vaibhavs10/insanely-fast-whisper

It makes the incredulous claims that it's approximately 6x times faster than faster-whisper. I checked the github repo and they don't make the sourcecode available (even though there's a "src" folder), but there is a library on Pypi that you can install named "insanely-fast-whisper" located https://pypi.org/project/insanely-fast-whisper/.

Apparently, you can use it either with or without FlashAttention2...I couldn't get FlashAttention2 to install...

Does Faster-Whisper use flashattention2? Does anyone know what the backend is...is it using Faster-Whisper per chance...the only difference being the "batch_size" parameter that allows them to process more segments of the audio file at once?

Even when I change the batch_size to 1, however, it still runs faster than faster-whisper APPROXIMATELY 2X, NOT 6X LIKE IT CLAIMS.

My test was as follows...

Using Faster-Whisper... large-v2 model in float32 format (in ctranslate2 format of course)

For "insanely-faster-whisper" I transcribed the same audio file...and here's the relevant portion of my script:

# Initialize the pipeline
pipe = pipeline("automatic-speech-recognition",
                "openai/whisper-large-v2",
                torch_dtype=torch.float32,
                device="cuda:0")

pipe.model = pipe.model.to_bettertransformer()

# Process the audio file
outputs = pipe("[REMOVED PATH TO FILE FOR PRIVACY REASONS",
               chunk_length_s=30,
               batch_size=1,
               return_timestamps=True)

Again, even though the batch size was "1" it was still approximately 2x as fast. Now, I didn't get a chance to test accuracy but...Anyone know what this library is based on???

BBC-Esq avatar Nov 15 '23 13:11 BBC-Esq