whisper-ctranslate2 icon indicating copy to clipboard operation
whisper-ctranslate2 copied to clipboard

Docker image for running whisper-ctranslate2

Open extremelyonline opened this issue 2 years ago • 5 comments

Hello, just wanted to start a discussion about running whisper-ctranslate2 in Docker. Referencing #109 of faster-whisper, I came up with the following Dockerfile, which works.

# Use Ubuntu as base
FROM ubuntu:20.04

# Alternatively, use a base image with CUDA and cuDNN support
# FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04

# Install necessary dependencies
RUN apt-get update && apt-get install -y python3-pip

# Set the working directory
WORKDIR /app

# Copy the app code and requirements filed
COPY . /app

# Install dependencies
RUN pip3 install --no-cache-dir -r requirements.txt

# Install whisper-ctranslate2
RUN pip install -U whisper-ctranslate2

# Set the entry point
ENTRYPOINT ["whisper-ctranslate2"]

Build with: docker build -t asr . Run with: docker run --rm -v /path/to/folder:/app --gpus '"device=0,1"' asr myfile.mp3 --compute_type int8

My observations are:

  • If the entrypoint is set, the container will not show transcribed lines as it runs. The result are only printed afterwards
  • However, The progress can be shown if you run the docker container (without the entrypoint) in interactive mode

extremelyonline avatar Dec 17 '23 20:12 extremelyonline

Correct me if I'm wrong, but is that repository literally just a wrapper around faster_whisper? I'm confused why its called insanely-fast-whisper.

AnkushMalaker avatar Nov 02 '23 23:11 AnkushMalaker

Correct me if I'm wrong, but is that repository literally just a wrapper around faster_whisper? I'm confused why its called insanely-fast-whisper.

I'm no expert but here's what it looks like to me. Compared to openai/whisper (or faster_whisper), insanely-fast-whisper:

  1. Uses Whisper models which are in the 🤗 transformers format
  2. Supports batching
  3. Supports 🤗 bettertransformer
  4. Supports the new 🤗 distil-whisper modules (which are much much faster)

Using one or all of these features leads to faster transcriptions.

ayancey avatar Nov 03 '23 00:11 ayancey

Hm, so to answer OPs question, no it shouldn't be too much work. Infact, point 3 is one line of code. Point 1 and 3 are both the same code to load huggingface models I believe batching is already handled to some extent when using this webservice based on your application. Adding batching option also shouldn't be hard though. I can take a crack at it

AnkushMalaker avatar Nov 03 '23 00:11 AnkushMalaker

Did anything come of this? The benchmarks posted by insanely-fast-whisper are hugely impressive versus just faster-whisper (1min18s for 150 minutes of audio on insanely-fast-whisper versus 8min15s for faster-whisper)

daytonturner avatar Jan 18 '24 23:01 daytonturner

Sorry, got busy. I'm sure speech research has progressed so far since the time this issue was opened. I've got a little time on my hands. Are there newer/better/faster models already supported or requested?

AnkushMalaker avatar May 06 '24 17:05 AnkushMalaker