insanely-fast-whisper icon indicating copy to clipboard operation
insanely-fast-whisper copied to clipboard

Get nearly double the performance gap with another docker image on the same system.

Open lordofriver opened this issue 9 months ago • 1 comments

Hi, thanks for your work on replicate.com .

I'm using your docker image r8.im/vaibhavs10/incredibly-fast-whisper, and another image yoeven/insanely-fast-whisper-api from JigsawStack/insanely-fast-whisper-api (with dockerfile)

On the same RTX4090 system, with same audio, same params below

    pipe = pipeline(
        "automatic-speech-recognition",
        model="openai/whisper-large-v3",
        torch_dtype=torch.float16,
        device="cuda:0",
        model_kwargs=({"attn_implementation": "flash_attention_2"}),
    )
    generate_kwargs = {
        "task": "transcribe",
        "language": "chinese",
        "repetition_penalty": 1.25,
        }
    outputs = pipe(
        url,
        chunk_length_s=30,
        batch_size=20,
        generate_kwargs=generate_kwargs,
        return_timestamps=True,
    )

After several tests, I get quiet a different performance result (the outputs are same).

yours the other one
transcribe time 72s 42s
average gpu usage with gpustat 22% 40%
average gpu memory 10679M 10690M
torch version 2.0.2+cu118 2.2.0a0+81ea7a4
cuda version cuda_11.8.r11.8 cuda_12.3.r12.3
decompressed image size 16.9G 34.7G

I tried your image because the image size difference. Wonder can we get the dockerfile to test if the performance difference is related to the cuda version (assume you were using base images from nvidia and easy to change).

lordofriver avatar May 10 '24 10:05 lordofriver

Are you running this on your local machine? I want to do the same and test it, are you able to post your docker run command or docker compose yaml to see how you're passing in your GPU etc?

ihaddy avatar Jun 05 '24 17:06 ihaddy