whisper-jax
whisper-jax copied to clipboard
New library possibly faster than Jax or just a hoax?
Has anyone seen this repository? https://github.com/Vaibhavs10/insanely-fast-whisper
It makes the incredulous claims that it's approximately 6x times faster than faster-whisper. I checked the github repo and they don't make the sourcecode available (even though there's a "src" folder), but there is a library on Pypi that you can install named "insanely-fast-whisper" located https://pypi.org/project/insanely-fast-whisper/.
Apparently, you can use it either with or without FlashAttention2...I couldn't get FlashAttention2 to install...
Does Faster-Whisper use flashattention2? Does anyone know what the backend is...is it using Faster-Whisper per chance...the only difference being the "batch_size" parameter that allows them to process more segments of the audio file at once?
Even when I change the batch_size to 1, however, it still runs faster than faster-whisper APPROXIMATELY 2X, NOT 6X LIKE IT CLAIMS.
My test was as follows...
Using Faster-Whisper...
large-v2
model in float32 format (in ctranslate2 format of course)
For "insanely-faster-whisper" I transcribed the same audio file...and here's the relevant portion of my script:
# Initialize the pipeline
pipe = pipeline("automatic-speech-recognition",
"openai/whisper-large-v2",
torch_dtype=torch.float32,
device="cuda:0")
pipe.model = pipe.model.to_bettertransformer()
# Process the audio file
outputs = pipe("[REMOVED PATH TO FILE FOR PRIVACY REASONS",
chunk_length_s=30,
batch_size=1,
return_timestamps=True)
Again, even though the batch size was "1" it was still approximately 2x as fast. Now, I didn't get a chance to test accuracy but...Anyone know what this library is based on???