docker-faster-whisper icon indicating copy to clipboard operation
docker-faster-whisper copied to clipboard

[BUG] CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

Open alex-vyverman opened this issue 1 year ago • 12 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

When running the gpu tagged image, I get an error in the container that the CUDA driver version is insufficient for CUDA runtime version. But unless I am mistaken, the image uses cudnn-cu12 and cublas-cu12, which should be compatible with NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 that I run Any ideas?

Expected Behavior

No response

Steps To Reproduce

  1. run the image in the environment described below

Environment

- OS:Ubuntu 24.04.1 LTS
- How docker service was installed: docker repo
- GPU: Nvidia RTX A4000
- nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
- nvidia-smi: NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4

CPU architecture

x86-64

Docker creation

services:
  faster-whisper:
    image: lscr.io/linuxserver/faster-whisper:gpu
    container_name: faster-whisper
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
      - WHISPER_MODEL=tiny-int8
      - WHISPER_BEAM=1 #optional
      - WHISPER_LANG=nl #optional
    volumes:
      - /mnt/local/container_data/whisper/data2:/config
    ports:
      - 10300:10300
    deploy:
      resources:
          reservations:
              generic_resources:
              - discrete_resource_spec:
                  kind: "NVIDIA-GPU"
                  value: 1

Container logs

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/lsiopy/lib/python3.12/site-packages/wyoming_faster_whisper/__main__.py", line 149, in <module>
    run()
  File "/lsiopy/lib/python3.12/site-packages/wyoming_faster_whisper/__main__.py", line 144, in run
    asyncio.run(main())
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.12/site-packages/wyoming_faster_whisper/__main__.py", line 119, in main
    whisper_model = faster_whisper.WhisperModel(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lsiopy/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 133, in __init__
    self.model = ctranslate2.models.Whisper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

alex-vyverman avatar Nov 29 '24 22:11 alex-vyverman

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

github-actions[bot] avatar Nov 29 '24 22:11 github-actions[bot]

I have a similar issue. Any ideas?

davidcampos avatar Dec 07 '24 21:12 davidcampos

same here.

gartensofa avatar Dec 20 '24 22:12 gartensofa

Had the same error. In my case I started the docker container with a

services:
  faster-whisper:
    runtime: nvidia

but I was not exposing the nvidia gpu. Adding the following section fixed the issue for me:

services:
  faster-whisper:
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

mreilaender avatar Dec 21 '24 18:12 mreilaender

On Unraid on the Docker container page I toggled Basic View to Advance View, then next to 'Extra Parameters:' I added: --gpus=all

image

wills106 avatar Dec 27 '24 11:12 wills106

This fixed it for me, thanks!

mobster1940 avatar Dec 31 '24 09:12 mobster1940

I am getting the same issue with Unraid. I attempted the "--gpus=all" parameter, but still no luck.

Phantom-Glass avatar Jan 02 '25 06:01 Phantom-Glass

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

LinuxServer-CI avatar Feb 01 '25 10:02 LinuxServer-CI

Are you sure @Phantom-Glass ? I just fixed the "error" with adding --gpus all on Unraid 7.0

klepptor avatar Feb 28 '25 08:02 klepptor

Are you sure @Phantom-Glass ? I just fixed the "error" with adding --gpus all on Unraid 7.0

After updating to 7.0, adding that into extra parameters did start working for me.

Phantom-Glass avatar Feb 28 '25 09:02 Phantom-Glass

hey. Just tried with the extra parameter and the issue is gone.

But it seems not to work because I found this in the logs:

INFO:faster_whisper:Processing audio with duration 00:02.820 INFO:wyoming_faster_whisper.handler:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Any insight?

jokerigno avatar Mar 26 '25 23:03 jokerigno

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

LinuxServer-CI avatar Apr 26 '25 10:04 LinuxServer-CI

Attempting to deploy on a k8s server with exposed GPUs that is happily running models on GPUs for Frigate and Ollama, but getting this same error trying to run this container. Sadly not much to add other than that, but I want to scare off that bot trying to mark this as stale and then close it 😀

Edit: Alright, solved it, did a couple things though - as I had 3x pods requesting 3x GPUs and I only have 3x GPUs, I was trying to just not request one at all and see if it would just run anyway (like nvidia-smi and frigate both do if you don't do requests and limits), I set up time sharing on the cluster so each GPU shows as 2 - and then I added a request and limit for 1 GPU - so either or both of those may have been the solution for me.

hyacin75 avatar May 27 '25 03:05 hyacin75

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

LinuxServer-CI avatar Jun 26 '25 10:06 LinuxServer-CI

This issue is locked due to inactivity

LinuxServer-CI avatar Sep 25 '25 11:09 LinuxServer-CI