whisper-asr-webservice Added whisperX support

I added support for the whisperX engine. The engine can be activated by setting the ASR_ENGINE to "whisperx". In order to use the diarization pipeline, a Huggingface access token needs to be supplied, using the "HF_TOKEN" variable. You also need to accept some user agreements (see https://github.com/m-bain/whisperX for further details). If you do not need diarization, the token is not required.

Aug 27 '23 09:08 DennisTheD

Love it! Will test this when I can.

Sep 26 '23 11:09 ayancey

I tested it and was able to get it working. Great work! Please take a look at the changes in 1.2 and fix merge conflicts and I will approve it. 👍

Oct 04 '23 20:10 ayancey

I updated the code to resolve the merge conflicts. Since the documentation was moved from the Readme, i need to update the documentation accordingly. So this PR is not yet ready to get merged.

Oct 08 '23 16:10 DennisTheD

I'm trying to test on my end, cloned your repo and failed to build on my Macbook. Will try on another machine shortly.

42.84 Building wheels for collected packages: antlr4-python3-runtime, docopt, julius, psutil, ruamel.yaml.clib
42.84   Building wheel for antlr4-python3-runtime (pyproject.toml): started
43.00   Building wheel for antlr4-python3-runtime (pyproject.toml): finished with status 'done'
43.00   Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=7f5454ecd9008d2b061876f05291a060ecfa370fbff31c2d359a4584ab11d6e4
43.00   Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88
43.00   Building wheel for docopt (pyproject.toml): started
43.11   Building wheel for docopt (pyproject.toml): finished with status 'done'
43.11   Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13705 sha256=a6d443258a1b8ab52eb345321238fb4d53214297b9f8733e31cd348eea265945
43.11   Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac
43.11   Building wheel for julius (pyproject.toml): started
43.22   Building wheel for julius (pyproject.toml): finished with status 'done'
43.22   Created wheel for julius: filename=julius-0.2.7-py3-none-any.whl size=21868 sha256=61232d4bf4b2d6a642c6f91c03ebf9248d58cd699b221b49e3f3faf03ddee1ce
43.22   Stored in directory: /root/.cache/pip/wheels/b9/b2/05/f883527ffcb7f2ead5438a2c23439aa0c881eaa9a4c80256f4
43.22   Building wheel for psutil (pyproject.toml): started
43.34   Building wheel for psutil (pyproject.toml): finished with status 'error'
43.35   error: subprocess-exited-with-error
43.35   
43.35   × Building wheel for psutil (pyproject.toml) did not run successfully.
43.35   │ exit code: 1
43.35   ╰─> [43 lines of output]
43.35       running bdist_wheel
43.35       running build
43.35       running build_py
43.35       creating build
43.35       creating build/lib.linux-aarch64-cpython-310
43.35       creating build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_pslinux.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_compat.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_pswindows.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_common.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psposix.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_pssunos.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psaix.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/__init__.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psosx.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       copying psutil/_psbsd.py -> build/lib.linux-aarch64-cpython-310/psutil
43.35       creating build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_memleaks.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/runner.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_misc.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_testutils.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_connections.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_posix.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_bsd.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/__main__.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_aix.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_sunos.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_process.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_linux.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_windows.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/__init__.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_contracts.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_osx.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_unicode.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       copying psutil/tests/test_system.py -> build/lib.linux-aarch64-cpython-310/psutil/tests
43.35       running build_ext
43.35       building 'psutil._psutil_linux' extension
43.35       creating build/temp.linux-aarch64-cpython-310
43.35       creating build/temp.linux-aarch64-cpython-310/psutil
43.35       gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_SIZEOF_PID_T=4 -DPSUTIL_VERSION=595 -DPy_LIMITED_API=0x03060000 -DPSUTIL_ETHTOOL_MISSING_TYPES=1 -DPSUTIL_LINUX=1 -I/app/.venv/include -I/usr/local/include/python3.10 -c psutil/_psutil_common.c -o build/temp.linux-aarch64-cpython-310/psutil/_psutil_common.o
43.35       psutil could not be installed from sources because gcc is not installed. Try running:
43.35         sudo apt-get install gcc python3-dev
43.35       error: command 'gcc' failed: No such file or directory
43.35       [end of output]
43.35   
43.35   note: This error originates from a subprocess, and is likely not a problem with pip.
43.35   ERROR: Failed building wheel for psutil
43.35   Building wheel for ruamel.yaml.clib (pyproject.toml): started
43.44   Building wheel for ruamel.yaml.clib (pyproject.toml): finished with status 'error'
43.45   error: subprocess-exited-with-error
43.45   
43.45   × Building wheel for ruamel.yaml.clib (pyproject.toml) did not run successfully.
43.45   │ exit code: 1
43.45   ╰─> [16 lines of output]
43.45       running bdist_wheel
43.45       running build
43.45       running build_py
43.45       creating build
43.45       creating build/lib.linux-aarch64-cpython-310
43.45       creating build/lib.linux-aarch64-cpython-310/ruamel
43.45       creating build/lib.linux-aarch64-cpython-310/ruamel/yaml
43.45       creating build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       copying ./setup.py -> build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       copying ./__init__.py -> build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       copying ./LICENSE -> build/lib.linux-aarch64-cpython-310/ruamel/yaml/clib
43.45       running build_ext
43.45       building '_ruamel_yaml' extension
43.45       creating build/temp.linux-aarch64-cpython-310
43.45       gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/app/.venv/include -I/usr/local/include/python3.10 -c _ruamel_yaml.c -o build/temp.linux-aarch64-cpython-310/_ruamel_yaml.o
43.45       error: command 'gcc' failed: No such file or directory
43.45       [end of output]
43.45   
43.45   note: This error originates from a subprocess, and is likely not a problem with pip.
43.45   ERROR: Failed building wheel for ruamel.yaml.clib
43.45 Successfully built antlr4-python3-runtime docopt julius
43.45 Failed to build psutil ruamel.yaml.clib
43.45 ERROR: Could not build wheels for psutil, ruamel.yaml.clib, which is required to install pyproject.toml-based projects
------
Dockerfile:28
--------------------
  26 |     RUN poetry install
  27 |     
  28 | >>> RUN $POETRY_VENV/bin/pip install pandas transformers nltk pyannote.audio
  29 |     RUN git clone --depth 1 https://github.com/m-bain/whisperX.git \
  30 |         && cd whisperX \
--------------------
ERROR: failed to solve: process "/bin/sh -c $POETRY_VENV/bin/pip install pandas transformers nltk pyannote.audio" did not complete successfully: exit code: 1

Oct 08 '23 20:10 dahifi

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

Oct 08 '23 20:10 ayancey

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

This was within the docker container. I haven't tried running it natively. I added notes and changes.

Thanks for this.

Oct 08 '23 20:10 dahifi

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

This was within the docker container. I haven't tried running it natively. I added notes and changes.

Thanks for this.

Oh, apologies. I'll look into this for you and try testing again. I don't know how it built for me without gcc.

Oct 08 '23 20:10 ayancey

error: command 'gcc' failed: No such file or directory

You need gcc from xcode or homebrew. Also, poetry is a huge pain in the ass.

This was within the docker container. I haven't tried running it natively. I added notes and changes.

Thanks for this.

I can replicate your issue when building the docker image on Apples M1. This is probably related to an missing precompiled python wheel, causing the arm architecture to require a compiler on build. While your suggested fix solves the build issue for me, i still run into issues when trying to transcribe an MP3, causing a crash of the Docker container:

[2023-10-09 10:44:49 +0000] [31] [INFO] Started server process [31]

[2023-10-09 10:44:49 +0000] [31] [INFO] Waiting for application startup.

[2023-10-09 10:44:49 +0000] [31] [INFO] Application startup complete.

[2023-10-09 10:45:29 +0000] [1] [WARNING] Worker with pid 31 was terminated due to signal 11

[2023-10-09 10:45:29 +0000] [55] [INFO] Booting worker with pid: 55

/app/.venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.

torchaudio.set_audio_backend("soundfile")

/app/.venv/lib/python3.10/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.

torchaudio.set_audio_backend("soundfile")

torchvision is not available - cannot save figures

[2023-10-09 10:45:32 +0000] [55] [INFO] Started server process [55]

[2023-10-09 10:45:32 +0000] [55] [INFO] Waiting for application startup.

[2023-10-09 10:45:32 +0000] [55] [INFO] Application startup complete

This issue persists, even when setting the ASR_ENGINE to openai_whisper, but not when using onerahmet/openai-whisper-asr-webservice:latest as base image. @dahifi can you replicate this issue on your side, or does the image work when using your suggested fix?

Oct 09 '23 11:10 DennisTheD

I have successfully run previous versions of the ASR engine, in Docker containers, on both the M1 and WSL Cuda.

Last night, on my WSL box, I attempted running the DennisTheD:main image, and am able to use the swagger interface to render a test file using the whisper x engine. Diarization tests using txt output rendered the transcript, without diarization notations. It did not use cuda, but the CPU instead. Attempts at trying diarization with other file format caused an exception in the SRT/VTT export, I don't recall which one.

What is it you need me to validate? M1 native or Docker?

Oct 09 '23 15:10 dahifi

Tested with docker with GPU. Standard transcriptions work without diarization (diarize=false). However diarization (diarize=true, min=1, max=3) fails with Response body: Internal Server Error. Looking at the logs indicates an issue with NameError: name 'diarize_model' is not defined.

Testing Prep

# Working Dir
WORKING_DIRECTORY="/mnt/user/docker/whisper-asr-webservice"
mkdir -p "${WORKING_DIRECTORY}"
cd ${WORKING_DIRECTORY}

# Make Folders & Files
mkdir -p  ./cache/{pip,poetry,whisper,faster-whisper}
ls -alt ${WORKING_DIRECTORY}/cache

# Clone Repository 
git clone https://github.com/DennisTheD/whisper-asr-webservice.git whisper-asr-webservice_DennisTheD

# https://github.com/ahmetoner/whisper-asr-webservice/pull/125
# NOTE: The engine can be activated by setting the ASR_ENGINE to "whisperx". In order to use the diarization pipeline, a Huggingface access token needs to be supplied, using the "HF_TOKEN" variable. You also need to accept some user agreements (see https://github.com/m-bain/whisperX for further details). If you do not need diarization, the token is not required.
cd whisper-asr-webservice_DennisTheD/
# git clean -fd
# git reset --hard
git pull
cd ..

Docker Copose File

version: "3.4"

services:

  whisper-asr-webservice-x-gpu:
    build:
      context: ./whisper-asr-webservice_DennisTheD
      dockerfile: Dockerfile.gpu
    # image: onerahmet/openai-whisper-asr-webservice:latest  #v1.0.6 #onerahmet/openai-whisper-asr-webservice:v1.0.6-gpu #v1.1.0-gpu   #latest-gpu
    container_name: whisper-asr-webservice_x_gpu
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - ASR_ENGINE=whisperx
      - ASR_MODEL=large # large-v2 # medium.en
      - HOST_OS="Unraid"
      - HOST_HOSTNAME="UnRAID-02"
      - HOST_CONTAINERNAME="whisper-asr-webservice_x_cpu"
    labels:
      - "net.unraid.docker.managed=dockerman"
      - "net.unraid.docker.description=Whisper ASR Webservice is a general-purpose speech recognition webservice."
      - "net.unraid.docker.webui=http://[IP]:[PORT:9008]/"
      - "net.unraid.docker.icon=https://res.cloudinary.com/apideck/image/upload/v1667440836/marketplaces/ckhg56iu1mkpc0b66vj7fsj3o/listings/14957082_wyd29r.png"
    ports:
      - 9008:9000
    volumes:
      # - ./app:/app/app
      - cache-pip:/root/.cache/pip
      - cache-poetry:/root/.cache/poetry
      - cache-whisper:/root/.cache/whisper # "/mnt/user/docker/whisper-asr-webservice/cache:/root/.cache/whisper"
      - cache-faster-whisper:/root/.cache/faster_whisper

volumes:
  # cache-pip:
  # cache-poetry:
  # cache-whisper:
  # cache-faster-whisper:
  cache-pip:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/pip
  cache-poetry:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/poetry
  cache-whisper:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/whisper
  cache-faster-whisper:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/faster-whisper

Docker Build & Run

docker-compose pull
DOCKER_BUILDKIT=1 docker-compose build --no-cache
docker-compose down --volumes
docker-compose up --detach --remove-orphans --force-recreate
docker-compose logs --follow

Error Message:

10/09/20232:08:11 PM
[2023-10-09 19:08:11 +0000] [28] [ERROR] Exception in ASGI application
10/09/20232:08:11 PM
Traceback (most recent call last):
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi
10/09/20232:08:11 PM
    result = await app(  # type: ignore[func-returns-value]
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
10/09/20232:08:11 PM
    return await self.app(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 276, in __call__
10/09/20232:08:11 PM
    await super().__call__(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
10/09/20232:08:11 PM
    await self.middleware_stack(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
10/09/20232:08:11 PM
    raise exc
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
10/09/20232:08:11 PM
    await self.app(scope, receive, _send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
10/09/20232:08:11 PM
    raise exc
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
10/09/20232:08:11 PM
    await self.app(scope, receive, sender)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
10/09/20232:08:11 PM
    raise e
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
10/09/20232:08:11 PM
    await self.app(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
10/09/20232:08:11 PM
    await route.handle(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
10/09/20232:08:11 PM
    await self.app(scope, receive, send)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
10/09/20232:08:11 PM
    response = await func(request)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
10/09/20232:08:11 PM
    raw_response = await run_endpoint_function(
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
10/09/20232:08:11 PM
    return await run_in_threadpool(dependant.call, **values)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
10/09/20232:08:11 PM
    return await anyio.to_thread.run_sync(func, *args)
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
10/09/20232:08:11 PM
    return await get_asynclib().run_sync_in_worker_thread(
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
10/09/20232:08:11 PM
    return await future
10/09/20232:08:11 PM
  File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
10/09/20232:08:11 PM
    result = context.run(func, *args)
10/09/20232:08:11 PM
  File "/app/app/webservice.py", line 89, in asr
10/09/20232:08:11 PM
    result = transcribe(
10/09/20232:08:11 PM
  File "/app/app/mbain_whisperx/core.py", line 62, in transcribe
10/09/20232:08:11 PM
    diarize_segments = diarize_model(audio, min_speakers, max_speakers)
10/09/20232:08:11 PM
NameError: name 'diarize_model' is not defined

Oct 09 '23 19:10 AustinSaintAubin

@AustinSaintAubin Did you provide the HF_TOKEN? That's required for diarization.

Oct 09 '23 19:10 ayancey

testing:

I was able to build the image on M1 mac once I made @dahifi's changes, but I couldn't get it to run, maybe a CPU or RAM limitation. Running on M1 with GPU accel doesn't seem like something we can do at this time, see this discussion.
I tested on my Windows PC with Docker Desktop. GPU accel and WhisperX working nicely. I tried providing the HF token and tested diarization, but my WSL and whole computer crashed for some reason. Won't be able to try that again until after working hours.

additional thoughts:

Would be nice if we could get rid of the diarize param entirely when the HF token isn't provided. Right now it fails with a 500 if you don't provide the token. (as seen above)
ARM docker images would be awesome, but shouldn't block us from merging this PR
Should include gcc and python3-dev as suggested by @dahifi so ARM users can at least use it with CPU.
Do we have a standardized format for JSON output depending on which backend is used?

Oct 09 '23 19:10 ayancey

Running on M1 with GPU accel doesn't seem like something we can do at this time

I haven't seen anything in the whisper community that can run M1 on anything other than CPU.

Should include gcc and python3-dev as suggested by @dahifi so ARM users can at least use it with CPU.

Again, the default engine runs fine with CUDA using the current docker image on my Win10 machine, although now I'm starting to question whether I pulled that in WSL or not. I've also been able to run https://github.com/MahmoudAshraf97/whisper-diarization in WSL and GPU support, but I remember I had some issues getting it going bc dependencies.

So I guess what I'm asking is whether this is a whisperx issue or something with my setup.

Oct 10 '23 01:10 dahifi

Running on M1 with GPU accel doesn't seem like something we can do at this time

I haven't seen anything in the whisper community that can run M1 on anything other than CPU.

Should include gcc and python3-dev as suggested by @dahifi so ARM users can at least use it with CPU.

Again, the default engine runs fine with CUDA using the current docker image on my Win10 machine, although now I'm starting to question whether I pulled that in WSL or not. I've also been able to run https://github.com/MahmoudAshraf97/whisper-diarization in WSL and GPU support, but I remember I had some issues getting it going bc dependencies.

So I guess what I'm asking is whether this is a whisperx issue or something with my setup.

I am able to get WhisperX on CUDA working with WSL. Can you post which GPU drivers you have? It could be a weird issue with driver and CUDA incompatibility. I am running 537.13 on a Geforce RTX 2080 Ti.

Oct 10 '23 01:10 ayancey

(thanks for notifying me @ayancey)

I built the docker gpu image. And I had some problems related to the HF_TOKEN, where it likely wouldn't get recognized from the docker-compose.yml. Or maybe there was a delay with the accepted user conditions. The container exited with:

whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [ERROR] Exception in worker process
whisperx-asr  | Traceback (most recent call last):
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
whisperx-asr  |     worker.init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
whisperx-asr  |     super(UvicornWorker, self).init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
whisperx-asr  |     self.load_wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
whisperx-asr  |     self.wsgi = self.app.wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
whisperx-asr  |     self.callable = self.load()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
whisperx-asr  |     return self.load_wsgiapp()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
whisperx-asr  |     return util.import_app(self.app_uri)
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
whisperx-asr  |     mod = importlib.import_module(module)
whisperx-asr  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
whisperx-asr  |     return _bootstrap._gcd_import(name[level:], package, level)
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
whisperx-asr  |   File "/app/app/webservice.py", line 18, in <module>
whisperx-asr  |     from .mbain_whisperx.core import transcribe, language_detection
whisperx-asr  |   File "/app/app/mbain_whisperx/core.py", line 18, in <module>
whisperx-asr  |     diarize_model = whisperx.DiarizationPipeline(use_auth_token=hf_token, device=device)
whisperx-asr  |   File "/app/whisperX/whisperx/diarize.py", line 19, in __init__
whisperx-asr  |     self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
whisperx-asr  | AttributeError: 'NoneType' object has no attribute 'to'
whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [INFO] Worker exiting (pid: 28)
whisperx-asr  | 
whisperx-asr  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisperx-asr  | It might be because the pipeline is private or gated so make
whisperx-asr  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisperx-asr  | create your access token and retry with:
whisperx-asr  | 
whisperx-asr  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisperx-asr  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
whisperx-asr  | 
whisperx-asr  | If this still does not work, it might be because the pipeline is gated:
whisperx-asr  | visit https://hf.co/pyannote/speaker-diarization-3.0 to accept the user conditions.
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Shutting down: Master
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Reason: Worker failed to boot.

the env of my docker-compose.yml:

environment:
      - ASR_MODEL=large-v2
      - HF_TOKEN="hf_jbseggsegssomethingJsgqBgeeeeeV"
      - ASR_ENGINE=whisperx

So I pasted it in ./app/mbain_whisperx/core.py and that got it to work. Will need to restest with the env again.

Is batched inferencing being used so far? The large model used almost 10GB Vram of my 3060 and it wasn't more perfomant/faster than the normal faster-whisper implementation. Will test more and update when I've the time 👍

Oct 10 '23 18:10 Deathproof76

(thanks for notifying me @ayancey)

I built the docker gpu image. And I had some problems related to the HF_TOKEN, where it likely wouldn't get recognized from the docker-compose.yml. Or maybe there was a delay with the accepted user conditions. The container exited with:

whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [ERROR] Exception in worker process
whisperx-asr  | Traceback (most recent call last):
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
whisperx-asr  |     worker.init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
whisperx-asr  |     super(UvicornWorker, self).init_process()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
whisperx-asr  |     self.load_wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
whisperx-asr  |     self.wsgi = self.app.wsgi()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
whisperx-asr  |     self.callable = self.load()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
whisperx-asr  |     return self.load_wsgiapp()
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
whisperx-asr  |     return util.import_app(self.app_uri)
whisperx-asr  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
whisperx-asr  |     mod = importlib.import_module(module)
whisperx-asr  |   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
whisperx-asr  |     return _bootstrap._gcd_import(name[level:], package, level)
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
whisperx-asr  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
whisperx-asr  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
whisperx-asr  |   File "/app/app/webservice.py", line 18, in <module>
whisperx-asr  |     from .mbain_whisperx.core import transcribe, language_detection
whisperx-asr  |   File "/app/app/mbain_whisperx/core.py", line 18, in <module>
whisperx-asr  |     diarize_model = whisperx.DiarizationPipeline(use_auth_token=hf_token, device=device)
whisperx-asr  |   File "/app/whisperX/whisperx/diarize.py", line 19, in __init__
whisperx-asr  |     self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
whisperx-asr  | AttributeError: 'NoneType' object has no attribute 'to'
whisperx-asr  | [2023-10-10 15:41:16 +0000] [28] [INFO] Worker exiting (pid: 28)
whisperx-asr  | 
whisperx-asr  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisperx-asr  | It might be because the pipeline is private or gated so make
whisperx-asr  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisperx-asr  | create your access token and retry with:
whisperx-asr  | 
whisperx-asr  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisperx-asr  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
whisperx-asr  | 
whisperx-asr  | If this still does not work, it might be because the pipeline is gated:
whisperx-asr  | visit https://hf.co/pyannote/speaker-diarization-3.0 to accept the user conditions.
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Shutting down: Master
whisperx-asr  | [2023-10-10 15:41:17 +0000] [27] [INFO] Reason: Worker failed to boot.

the env of my docker-compose.yml:

environment:
      - ASR_MODEL=large-v2
      - HF_TOKEN="hf_jbseggsegssomethingJsgqBgeeeeeV"
      - ASR_ENGINE=whisperx

So I pasted it in ./app/mbain_whisperx/core.py and that got it to work. Will need to restest with the env again.

Is batched inferencing being used so far? The large model used almost 10GB Vram of my 3060 and it wasn't more perfomant/faster than the normal faster-whisper implementation. Will test more and update when I've the time 👍

To be honest, I don't know. I'll do some benchmarks comparing the speed of all three backends. I'm most excited for the increased accuracy of timestamps and diarization.

Oct 10 '23 19:10 ayancey

@Deathproof76 I'm not sure if it's the same, but the Whisper reqs noted 3 HF gated models that I needed to clear. There was one additional. See the note in your error: 'pyannote/speaker-diarization-3.0'

Oct 11 '23 00:10 dahifi

@ayancey I'm using 537.58.

I originally ran the readme's cmd where you clone the image from docker hub. That's the one that uses CUDA. I'm going to go back to square one and see if I can pull it from source and have it run the same. Right now I have this PR as a separate remote and I'm not comparing apples to apples.

Oct 11 '23 01:10 dahifi

@AustinSaintAubin Did you provide the HF_TOKEN? That's required for diarization.

I have testing again with environment variable set, and checked the dependent pipline /pyannote/speaker-diarization-3.0 is not gated to me... still not able to download the 'pyannote/speaker-diarization-3.0' pipeline. Not sure if HF_TOKEN is being passed or handled correctly.

    environment:
      - HF_TOKEN="hf_thehuggingfacetokenformyaccount"

https://huggingface.co/pyannote/speaker-diarization-3.0 Gated model: You have been granted access to this model

...
whisper-asr-webservice_x_gpu  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisper-asr-webservice_x_gpu  | It might be because the pipeline is private or gated so make
whisper-asr-webservice_x_gpu  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisper-asr-webservice_x_gpu  | create your access token and retry with:
whisper-asr-webservice_x_gpu  | 
whisper-asr-webservice_x_gpu  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisper-asr-webservice_x_gpu  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
...

Oct 12 '23 15:10 AustinSaintAubin

Next step for me will probably be looking at the whisperx repo directly and see if I can get that to work anywhere first.

Oct 12 '23 16:10 dahifi

@AustinSaintAubin Did you provide the HF_TOKEN? That's required for diarization.

I have testing again with environment variable set, and checked the dependent pipline /pyannote/speaker-diarization-3.0 is not gated to me... still not able to download the 'pyannote/speaker-diarization-3.0' pipeline. Not sure if HF_TOKEN is being passed or handled correctly.
    environment:
      - HF_TOKEN="hf_thehuggingfacetokenformyaccount"
https://huggingface.co/pyannote/speaker-diarization-3.0 Gated model: You have been granted access to this model
...
whisper-asr-webservice_x_gpu  | Could not download 'pyannote/speaker-diarization-3.0' pipeline.
whisper-asr-webservice_x_gpu  | It might be because the pipeline is private or gated so make
whisper-asr-webservice_x_gpu  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisper-asr-webservice_x_gpu  | create your access token and retry with:
whisper-asr-webservice_x_gpu  | 
whisper-asr-webservice_x_gpu  |    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.0',
whisper-asr-webservice_x_gpu  |    ...                          use_auth_token=YOUR_AUTH_TOKEN)
...

Try making a new token. This took a couple tries for me to get working on the original WhisperX repo. I don't think its related to this PR.

Oct 12 '23 16:10 ayancey

@ayancey, have tried new token, still run into issues running after building the container. Have testing GPU and CPU version. Have included

Build and Run

DOCKER_BUILDKIT=1 docker-compose build --no-cache
docker-compose down --volumes
docker-compose up --detach --remove-orphans --force-recreate
docker-compose logs --follow

CPU Docker File.

FROM swaggerapi/swagger-ui:v4.18.2 AS swagger-ui
FROM python:3.10-slim

ENV POETRY_VENV=/app/.venv

RUN export DEBIAN_FRONTEND=noninteractive \
    && apt-get -qq update \
    && apt-get -qq install --no-install-recommends \
    ffmpeg \
    git \
    gcc \
    python3-dev \
    && rm -rf /var/lib/apt/lists/*

RUN python3 -m venv $POETRY_VENV \
    && $POETRY_VENV/bin/pip install -U pip setuptools \
    && $POETRY_VENV/bin/pip install poetry==1.6.1

ENV PATH="${PATH}:${POETRY_VENV}/bin"

WORKDIR /app

COPY . /app
COPY --from=swagger-ui /usr/share/nginx/html/swagger-ui.css swagger-ui-assets/swagger-ui.css
COPY --from=swagger-ui /usr/share/nginx/html/swagger-ui-bundle.js swagger-ui-assets/swagger-ui-bundle.js

RUN poetry config virtualenvs.in-project true
RUN poetry install

RUN $POETRY_VENV/bin/pip install pandas transformers nltk pyannote.audio
RUN git clone --depth 1 https://github.com/m-bain/whisperX.git \
    && cd whisperX \
    && $POETRY_VENV/bin/pip install -e .

EXPOSE 9000
ENTRYPOINT ["gunicorn", "--bind", "0.0.0.0:9000", "--workers", "1", "--timeout", "0", "app.webservice:app", "-k", "uvicorn.workers.UvicornWorker"]

Docker Compose

version: "3.4"

services:

  whisper-asr-webservice-x-cpu:
    build:
      context: ./repositories/whisper-asr-webservice_DennisTheD
      dockerfile: Dockerfile
    # image: onerahmet/openai-whisper-asr-webservice:latest  #v1.0.6 #onerahmet/openai-whisper-asr-webservice:v1.0.6-gpu #v1.1.0-gpu   #latest-gpu
    container_name: whisper-asr-webservice_x_cpu
    restart: unless-stopped
    environment:
      - HF_TOKEN=hf_USEYOUROWN
      - ASR_ENGINE=whisperx
      - ASR_MODEL=large # large-v2 # medium.en
      - HOST_OS="Unraid"
      - HOST_HOSTNAME="UnRAID-02"
      - HOST_CONTAINERNAME="whisper-asr-webservice_x_cpu"
    labels:
      - "net.unraid.docker.managed=dockerman"
      - "net.unraid.docker.description=Whisper ASR Webservice is a general-purpose speech recognition webservice."
      - "net.unraid.docker.webui=http://[IP]:[PORT:9007]/"
      - "net.unraid.docker.icon=https://res.cloudinary.com/apideck/image/upload/v1667440836/marketplaces/ckhg56iu1mkpc0b66vj7fsj3o/listings/14957082_wyd29r.png"
    ports:
      - 9007:9000
    volumes:
      # - ./app:/app/app
      - cache-pip:/root/.cache/pip
      - cache-poetry:/root/.cache/poetry
      - cache-whisper:/root/.cache/whisper # "/mnt/user/docker/whisper-asr-webservice/cache:/root/.cache/whisper"
      - cache-faster-whisper:/root/.cache/faster_whisper

  # whisper-asr-webservice-x-gpu:
  #   build:
  #     context: ./repositories/whisper-asr-webservice_DennisTheD
  #     dockerfile: Dockerfile.gpu
  #   # image: onerahmet/openai-whisper-asr-webservice:latest  #v1.0.6 #onerahmet/openai-whisper-asr-webservice:v1.0.6-gpu #v1.1.0-gpu   #latest-gpu
  #   container_name: whisper-asr-webservice_x_gpu
  #   restart: unless-stopped
  #   # env_file: .env
  #   deploy:
  #     resources:
  #       reservations:
  #         devices:
  #           - driver: nvidia
  #             # count: 1
  #             device_ids: ['1']  # sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi
  #             capabilities: [gpu]
  #   environment:
  #     - HF_TOKEN="hf_USEYOUROWN"
  #     - ASR_ENGINE=whisperx
  #     - ASR_MODEL=large # large-v2 # medium.en
  #     - HOST_OS="Unraid"
  #     - HOST_HOSTNAME="UnRAID-02"
  #     - HOST_CONTAINERNAME="whisper-asr-webservice_x_cpu"
  #   labels:
  #     - "net.unraid.docker.managed=dockerman"
  #     - "net.unraid.docker.description=Whisper ASR Webservice is a general-purpose speech recognition webservice."
  #     - "net.unraid.docker.webui=http://[IP]:[PORT:9008]/"
  #     - "net.unraid.docker.icon=https://res.cloudinary.com/apideck/image/upload/v1667440836/marketplaces/ckhg56iu1mkpc0b66vj7fsj3o/listings/14957082_wyd29r.png"
  #   ports:
  #     - 9008:9000
  #   volumes:
  #     # - ./app:/app/app
  #     - cache-pip:/root/.cache/pip
  #     - cache-poetry:/root/.cache/poetry
  #     - cache-whisper:/root/.cache/whisper # "/mnt/user/docker/whisper-asr-webservice/cache:/root/.cache/whisper"
  #     - cache-faster-whisper:/root/.cache/faster_whisper

volumes:
  # cache-pip:
  # cache-poetry:
  # cache-whisper:
  # cache-faster-whisper:
  cache-pip:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/pip
  cache-poetry:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/poetry
  cache-whisper:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/whisper
  cache-faster-whisper:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: ./cache/faster-whisper

Here is the docker log output, for the CPU version.

whisper-asr-webservice_x_cpu  | [2023-10-12 18:45:55 +0000] [1] [INFO] Starting gunicorn 20.1.0
whisper-asr-webservice_x_cpu  | [2023-10-12 18:45:55 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)
whisper-asr-webservice_x_cpu  | [2023-10-12 18:45:55 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
whisper-asr-webservice_x_cpu  | [2023-10-12 18:45:55 +0000] [7] [INFO] Booting worker with pid: 7
whisper-asr-webservice_x_cpu  | /app/.venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
whisper-asr-webservice_x_cpu  |   torchaudio.set_audio_backend("soundfile")
whisper-asr-webservice_x_cpu  | /app/.venv/lib/python3.10/site-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
whisper-asr-webservice_x_cpu  |   torchaudio.set_audio_backend("soundfile")
whisper-asr-webservice_x_cpu  | torchvision is not available - cannot save figures
Downloading (…)lve/main/config.yaml: 100%|██████████| 467/467 [00:00<00:00, 2.07MB/s]
whisper-asr-webservice_x_cpu  | [2023-10-12 18:46:21 +0000] [7] [ERROR] Exception in worker process
whisper-asr-webservice_x_cpu  | Traceback (most recent call last):
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
whisper-asr-webservice_x_cpu  |     worker.init_process()
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
whisper-asr-webservice_x_cpu  |     super(UvicornWorker, self).init_process()
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
whisper-asr-webservice_x_cpu  |     self.load_wsgi()
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
whisper-asr-webservice_x_cpu  |     self.wsgi = self.app.wsgi()
whisper-asr-webservice_x_cpu  | 
whisper-asr-webservice_x_cpu  | Could not download 'pyannote/segmentation-3.0' model.
whisper-asr-webservice_x_cpu  | It might be because the model is private or gated so make
whisper-asr-webservice_x_cpu  | sure to authenticate. Visit https://hf.co/settings/tokens to
whisper-asr-webservice_x_cpu  | create your access token and retry with:
whisper-asr-webservice_x_cpu  | 
whisper-asr-webservice_x_cpu  |    >>> Model.from_pretrained('pyannote/segmentation-3.0',
whisper-asr-webservice_x_cpu  |    ...                       use_auth_token=YOUR_AUTH_TOKEN)
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
whisper-asr-webservice_x_cpu  |     self.callable = self.load()
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
whisper-asr-webservice_x_cpu  |     return self.load_wsgiapp()
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
whisper-asr-webservice_x_cpu  |     return util.import_app(self.app_uri)
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
whisper-asr-webservice_x_cpu  |     mod = importlib.import_module(module)
whisper-asr-webservice_x_cpu  |   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
whisper-asr-webservice_x_cpu  |     return _bootstrap._gcd_import(name[level:], package, level)
whisper-asr-webservice_x_cpu  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
whisper-asr-webservice_x_cpu  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
whisper-asr-webservice_x_cpu  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
whisper-asr-webservice_x_cpu  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
whisper-asr-webservice_x_cpu  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
whisper-asr-webservice_x_cpu  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
whisper-asr-webservice_x_cpu  |   File "/app/app/webservice.py", line 18, in <module>
whisper-asr-webservice_x_cpu  |     from .mbain_whisperx.core import transcribe, language_detection
whisper-asr-webservice_x_cpu  |   File "/app/app/mbain_whisperx/core.py", line 23, in <module>
whisper-asr-webservice_x_cpu  |     diarize_model = whisperx.DiarizationPipeline(use_auth_token=hf_token, device=device)
whisper-asr-webservice_x_cpu  |   File "/app/whisperX/whisperx/diarize.py", line 19, in __init__
whisper-asr-webservice_x_cpu  |     self.model = Pipeline.from_pretrained(model_name, use_auth_token=use_auth_token).to(device)
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 136, in from_pretrained
whisper-asr-webservice_x_cpu  |     pipeline = Klass(**params)
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 128, in __init__
whisper-asr-webservice_x_cpu  |     model: Model = get_model(segmentation, use_auth_token=use_auth_token)
whisper-asr-webservice_x_cpu  |   File "/app/.venv/lib/python3.10/site-packages/pyannote/audio/pipelines/utils/getter.py", line 89, in get_model
whisper-asr-webservice_x_cpu  |     model.eval()
whisper-asr-webservice_x_cpu  | AttributeError: 'NoneType' object has no attribute 'eval'
whisper-asr-webservice_x_cpu  | [2023-10-12 18:46:21 +0000] [7] [INFO] Worker exiting (pid: 7)
whisper-asr-webservice_x_cpu  | 
whisper-asr-webservice_x_cpu  | If this still does not work, it might be because the model is gated:
whisper-asr-webservice_x_cpu  | visit https://hf.co/pyannote/segmentation-3.0 to accept the user conditions.
whisper-asr-webservice_x_cpu  | [2023-10-12 18:46:23 +0000] [1] [INFO] Shutting down: Master
whisper-asr-webservice_x_cpu  | [2023-10-12 18:46:23 +0000] [1] [INFO] Reason: Worker failed to boot.

Oct 12 '23 19:10 AustinSaintAubin

@AustinSaintAubin You need to get access to both models: https://huggingface.co/pyannote/speaker-diarization https://huggingface.co/pyannote/segmentation

It looks like you got the access for one, but not the other.

Oct 12 '23 19:10 ayancey

@AustinSaintAubin You need to get access to both models: https://huggingface.co/pyannote/speaker-diarization https://huggingface.co/pyannote/segmentation

It looks like you got the access for one, but not the other.

https://huggingface.co/pyannote/speaker-diarization Gated model: You have been granted access to this model

https://huggingface.co/pyannote/segmentation Gated model: You have been granted access to this model

Sorry had not mentioned, had already vistied all three repos and accepted EULAs.

Oct 14 '23 22:10 AustinSaintAubin

I think pyannote released a newer segmentation model v3 (https://huggingface.co/pyannote/segmentation-3.0). After accepting the EULA it should work fine (at least for on with Windows+WSL).

Oct 15 '23 11:10 DennisTheD

I think pyannote released a newer segmentation model v3 (https://huggingface.co/pyannote/segmentation-3.0). After accepting the EULA it should work fine (at least for on with Windows+WSL).

That was it, at least for the CPU version (GPU version is still having the same issues as before); accepted EULA for segmentation-3.0) and now working as expected.

Oct 16 '23 03:10 AustinSaintAubin

Hi! I would really like to see the WhisperX support in the project, is it possible to somehow speed up the code review procedure? Maybe you need some help?

Nov 05 '23 12:11 EvilFreelancer

Dear @m-bain,

I have concerns regarding the WhisperX license. To avoid potential conflicts, would it be sufficient to address license requirements in a manner similar to other tools, as outlined in following features sections?

https://github.com/ahmetoner/whisper-asr-webservice#features
https://ahmetoner.github.io/whisper-asr-webservice/#features

Nov 05 '23 14:11 ahmetoner

I have concerns regarding the WhisperX license. To avoid potential conflicts, would it be sufficient to address license requirements in a manner similar to other tools, as outlined in following features sections?

Whisperx has a pretty fair license: https://github.com/m-bain/whisperX/blob/main/LICENSE

Nov 17 '23 02:11 dahifi

So I can confirm that I was able to update the docker file by adding

      - ASR_ENGINE=whisperx
      - HF_TOKEN=

And I can confirm that it's offloading to my gpu. That said, I still can't confirm the output yet, TXT files are no good and I get KeyError: 'max_line_width' when selecting VTT. I'm trying another test with a smaller file, but basically transcription works but not diarize.

Nov 17 '23 03:11 dahifi

whisper-asr-webservice whisper-asr-webservice copied to clipboard

Added whisperX support

Testing Prep

Docker Copose File

Docker Build & Run

Error Message:

whisper-asr-webservice
whisper-asr-webservice copied to clipboard