whisper-ctranslate2 Docker image for running whisper-ctranslate2

Hello, just wanted to start a discussion about running whisper-ctranslate2 in Docker. Referencing #109 of faster-whisper, I came up with the following Dockerfile, which works.

# Use Ubuntu as base
FROM ubuntu:20.04

# Alternatively, use a base image with CUDA and cuDNN support
# FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04

# Install necessary dependencies
RUN apt-get update && apt-get install -y python3-pip

# Set the working directory
WORKDIR /app

# Copy the app code and requirements filed
COPY . /app

# Install dependencies
RUN pip3 install --no-cache-dir -r requirements.txt

# Install whisper-ctranslate2
RUN pip install -U whisper-ctranslate2

# Set the entry point
ENTRYPOINT ["whisper-ctranslate2"]

Build with: docker build -t asr . Run with: docker run --rm -v /path/to/folder:/app --gpus '"device=0,1"' asr myfile.mp3 --compute_type int8

My observations are:

If the entrypoint is set, the container will not show transcribed lines as it runs. The result are only printed afterwards
However, The progress can be shown if you run the docker container (without the entrypoint) in interactive mode

Dec 17 '23 20:12 extremelyonline

Correct me if I'm wrong, but is that repository literally just a wrapper around faster_whisper? I'm confused why its called insanely-fast-whisper.

Nov 02 '23 23:11 AnkushMalaker

Correct me if I'm wrong, but is that repository literally just a wrapper around faster_whisper? I'm confused why its called insanely-fast-whisper.

I'm no expert but here's what it looks like to me. Compared to openai/whisper (or faster_whisper), insanely-fast-whisper:

Uses Whisper models which are in the 🤗 transformers format
Supports batching
Supports 🤗 bettertransformer
Supports the new 🤗 distil-whisper modules (which are much much faster)

Using one or all of these features leads to faster transcriptions.

Nov 03 '23 00:11 ayancey

Hm, so to answer OPs question, no it shouldn't be too much work. Infact, point 3 is one line of code. Point 1 and 3 are both the same code to load huggingface models I believe batching is already handled to some extent when using this webservice based on your application. Adding batching option also shouldn't be hard though. I can take a crack at it

Nov 03 '23 00:11 AnkushMalaker

Did anything come of this? The benchmarks posted by insanely-fast-whisper are hugely impressive versus just faster-whisper (1min18s for 150 minutes of audio on insanely-fast-whisper versus 8min15s for faster-whisper)

Jan 18 '24 23:01 daytonturner

Sorry, got busy. I'm sure speech research has progressed so far since the time this issue was opened. I've got a little time on my hands. Are there newer/better/faster models already supported or requested?

May 06 '24 17:05 AnkushMalaker