insanely-fast-whisper
insanely-fast-whisper copied to clipboard
Get nearly double the performance gap with another docker image on the same system.
Hi, thanks for your work on replicate.com .
I'm using your docker image r8.im/vaibhavs10/incredibly-fast-whisper
,
and another image yoeven/insanely-fast-whisper-api
from JigsawStack/insanely-fast-whisper-api (with dockerfile)
On the same RTX4090 system, with same audio, same params below
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v3",
torch_dtype=torch.float16,
device="cuda:0",
model_kwargs=({"attn_implementation": "flash_attention_2"}),
)
generate_kwargs = {
"task": "transcribe",
"language": "chinese",
"repetition_penalty": 1.25,
}
outputs = pipe(
url,
chunk_length_s=30,
batch_size=20,
generate_kwargs=generate_kwargs,
return_timestamps=True,
)
After several tests, I get quiet a different performance result (the outputs are same).
yours | the other one | |
---|---|---|
transcribe time | 72s | 42s |
average gpu usage with gpustat | 22% | 40% |
average gpu memory | 10679M | 10690M |
torch version | 2.0.2+cu118 | 2.2.0a0+81ea7a4 |
cuda version | cuda_11.8.r11.8 | cuda_12.3.r12.3 |
decompressed image size | 16.9G | 34.7G |
I tried your image because the image size difference. Wonder can we get the dockerfile to test if the performance difference is related to the cuda version (assume you were using base images from nvidia and easy to change).
Are you running this on your local machine? I want to do the same and test it, are you able to post your docker run command or docker compose yaml to see how you're passing in your GPU etc?