whisper-ctranslate2 stdin support

cat sound.wav | whisper-ctranslate2

Changed src/whisper-ctranslate2.py. Edits the check for no audio cli argument and no live transcribe.. If there is stdin data then a new temporary file is created and written to with the stdin data, then updates the audio ( cli argument ) to be the newly created temporary file.

The purpose of this is to allow foreign programs written in other languages to pipe in data to whisper-ctranslate2.

Nov 23 '23 02:11 cianyyz

Tomorrow we will have new release for this gpu issue.

May 17 '23 06:05 ahmetoner

Great! Looking forward to it!

May 17 '23 15:05 themantalope

Could you please test debug image to verify GPU issue is gone?

docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base onerahmet/openai-whisper-asr-webservice:debug-gpu

May 19 '23 00:05 ahmetoner

@ahmetoner - I am having the same GPU issue. I've tried the debug-gpu, v1.1.0-gpu, and latest-gpu image tags but all files are still transcribed using CPUs.

In the screenshot you can see I run nvidia-smi and my four GPUs are available to the container image, however, from the activated venv torch does not recognize the GPUs.

Screenshot 2023-05-30 at 4 00 01 PM

May 30 '23 20:05 dickiesanders

Sorry, wasnt able to test until now (had to do a board exam). It's still using the CPU. Here are the logs from the docker container:

[2023-06-03 15:33:39 +0000] [7] [INFO] Starting gunicorn 20.1.0

[2023-06-03 15:33:39 +0000] [7] [INFO] Listening at: http://0.0.0.0:9000 (7)

[2023-06-03 15:33:39 +0000] [7] [INFO] Using worker: uvicorn.workers.UvicornWorker

[2023-06-03 15:33:39 +0000] [8] [INFO] Booting worker with pid: 8

/app/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)

  return torch._C._cuda_getDeviceCount() > 0


  0%|                                               | 0.00/461M [00:00<?, ?iB/s]
  1%|▎                                     | 4.16M/461M [00:00<00:10, 43.7MiB/s]
  2%|▋                                     | 8.33M/461M [00:00<00:11, 42.1MiB/s]
  3%|█                                     | 12.7M/461M [00:00<00:10, 43.9MiB/s]
  4%|█▍                                    | 16.9M/461M [00:00<00:10, 43.7MiB/s]
  5%|█▋                                    | 21.1M/461M [00:00<00:10, 43.5MiB/s]
  5%|██                                    | 25.3M/461M [00:00<00:11, 40.0MiB/s]
  6%|██▍                                   | 29.1M/461M [00:00<00:12, 37.3MiB/s]
  7%|██▋                                   | 32.7M/461M [00:00<00:13, 34.3MiB/s]
  8%|██▉                                   | 36.1M/461M [00:00<00:12, 34.5MiB/s]
  9%|███▎                                  | 39.8M/461M [00:01<00:12, 35.8MiB/s]
  9%|███▌                                  | 43.8M/461M [00:01<00:11, 37.4MiB/s]
 10%|███▉                                  | 47.7M/461M [00:01<00:11, 38.4MiB/s]
 11%|████▏                                 | 51.5M/461M [00:01<00:11, 38.9MiB/s]
 12%|████▌                                 | 55.4M/461M [00:01<00:10, 39.3MiB/s]
 13%|████▊                                 | 59.2M/461M [00:01<00:12, 34.8MiB/s]
 14%|█████▏                                | 63.0M/461M [00:01<00:11, 36.2MiB/s]
 15%|█████▌                                | 67.6M/461M [00:01<00:10, 39.7MiB/s]
 16%|█████▉                                | 72.2M/461M [00:01<00:09, 42.0MiB/s]
 17%|██████▎                               | 76.3M/461M [00:02<00:10, 38.8MiB/s]
 17%|██████▌                               | 80.1M/461M [00:02<00:10, 38.7MiB/s]
 18%|██████▉                               | 83.8M/461M [00:02<00:11, 35.2MiB/s]
 19%|███████▏                              | 87.3M/461M [00:02<00:11, 35.4MiB/s]
 20%|███████▍                              | 90.7M/461M [00:02<00:12, 32.0MiB/s]
 20%|███████▊                              | 94.4M/461M [00:02<00:11, 33.5MiB/s]
 21%|████████                              | 98.5M/461M [00:02<00:10, 36.2MiB/s]
 22%|████████▋                              | 102M/461M [00:02<00:10, 36.9MiB/s]
 23%|████████▉                              | 106M/461M [00:02<00:10, 34.7MiB/s]
 24%|█████████▏                             | 109M/461M [00:03<00:10, 35.1MiB/s]
 24%|█████████▌                             | 113M/461M [00:03<00:10, 33.5MiB/s]
 25%|█████████▉                             | 117M/461M [00:03<00:09, 36.4MiB/s]
 26%|██████████▏                            | 121M/461M [00:03<00:09, 38.6MiB/s]
 27%|██████████▌                            | 125M/461M [00:03<00:09, 39.0MiB/s]
 28%|██████████▉                            | 129M/461M [00:03<00:08, 39.4MiB/s]
 29%|███████████▏                           | 133M/461M [00:03<00:08, 39.2MiB/s]
 30%|███████████▌                           | 136M/461M [00:03<00:08, 38.8MiB/s]
 30%|███████████▊                           | 140M/461M [00:03<00:09, 36.8MiB/s]
 31%|████████████▏                          | 144M/461M [00:04<00:08, 38.7MiB/s]
 32%|████████████▌                          | 149M/461M [00:04<00:08, 40.8MiB/s]
 33%|████████████▉                          | 153M/461M [00:04<00:08, 38.0MiB/s]
 34%|█████████████▏                         | 156M/461M [00:04<00:08, 36.2MiB/s]
 35%|█████████████▌                         | 160M/461M [00:04<00:08, 38.0MiB/s]
 36%|█████████████▉                         | 164M/461M [00:04<00:08, 35.9MiB/s]
 36%|██████████████▏                        | 168M/461M [00:04<00:08, 36.8MiB/s]
 37%|██████████████▌                        | 172M/461M [00:04<00:07, 38.2MiB/s]
 38%|██████████████▊                        | 176M/461M [00:04<00:07, 39.0MiB/s]
 39%|███████████████▏                       | 179M/461M [00:05<00:07, 38.4MiB/s]
 40%|███████████████▍                       | 183M/461M [00:05<00:08, 36.4MiB/s]
 41%|███████████████▊                       | 187M/461M [00:05<00:07, 37.6MiB/s]
 41%|████████████████                       | 191M/461M [00:05<00:07, 35.6MiB/s]
 42%|████████████████▍                      | 194M/461M [00:05<00:07, 35.8MiB/s]
 43%|████████████████▋                      | 198M/461M [00:05<00:08, 34.1MiB/s]
 44%|█████████████████                      | 201M/461M [00:05<00:07, 34.8MiB/s]
 45%|█████████████████▎                     | 205M/461M [00:05<00:07, 37.4MiB/s]
 45%|█████████████████▋                     | 210M/461M [00:05<00:06, 39.5MiB/s]
 46%|██████████████████                     | 213M/461M [00:05<00:07, 36.9MiB/s]
 47%|██████████████████▍                    | 217M/461M [00:06<00:06, 38.4MiB/s]
 48%|██████████████████▋                    | 221M/461M [00:06<00:07, 35.9MiB/s]
 49%|██████████████████▉                    | 225M/461M [00:06<00:07, 34.5MiB/s]
 49%|███████████████████▎                   | 228M/461M [00:06<00:07, 32.3MiB/s]
 50%|███████████████████▌                   | 231M/461M [00:06<00:07, 30.2MiB/s]
 51%|███████████████████▉                   | 235M/461M [00:06<00:07, 33.8MiB/s]
 52%|████████████████████▏                  | 239M/461M [00:06<00:07, 32.7MiB/s]
 53%|████████████████████▍                  | 242M/461M [00:06<00:06, 34.5MiB/s]
 53%|████████████████████▊                  | 246M/461M [00:06<00:06, 36.4MiB/s]
 54%|█████████████████████                  | 250M/461M [00:07<00:06, 33.0MiB/s]
 55%|█████████████████████▍                 | 254M/461M [00:07<00:06, 36.1MiB/s]
 56%|█████████████████████▊                 | 259M/461M [00:07<00:05, 39.1MiB/s]
 57%|██████████████████████▏                | 263M/461M [00:07<00:05, 41.0MiB/s]
 58%|██████████████████████▌                | 267M/461M [00:07<00:04, 42.8MiB/s]
 59%|██████████████████████▉                | 272M/461M [00:07<00:04, 43.3MiB/s]
 60%|███████████████████████▎               | 276M/461M [00:07<00:04, 42.3MiB/s]
 61%|███████████████████████▋               | 280M/461M [00:07<00:04, 41.2MiB/s]
 62%|████████████████████████               | 284M/461M [00:07<00:04, 39.8MiB/s]
 62%|████████████████████████▎              | 288M/461M [00:08<00:04, 38.7MiB/s]
 63%|████████████████████████▋              | 291M/461M [00:08<00:04, 39.0MiB/s]
 64%|████████████████████████▉              | 295M/461M [00:08<00:04, 38.4MiB/s]
 65%|█████████████████████████▎             | 299M/461M [00:08<00:04, 38.9MiB/s]
 66%|█████████████████████████▋             | 303M/461M [00:08<00:04, 40.3MiB/s]
 67%|█████████████████████████▉             | 307M/461M [00:08<00:04, 35.5MiB/s]
 67%|██████████████████████████▎            | 311M/461M [00:08<00:04, 35.4MiB/s]
 68%|██████████████████████████▌            | 315M/461M [00:08<00:04, 37.2MiB/s]
 69%|██████████████████████████▉            | 319M/461M [00:08<00:03, 38.8MiB/s]
 70%|███████████████████████████▎           | 322M/461M [00:09<00:04, 34.4MiB/s]
 71%|███████████████████████████▌           | 326M/461M [00:09<00:04, 35.0MiB/s]
 71%|███████████████████████████▊           | 329M/461M [00:09<00:03, 34.8MiB/s]
 72%|████████████████████████████▏          | 333M/461M [00:09<00:03, 35.1MiB/s]
 73%|████████████████████████████▍          | 337M/461M [00:09<00:03, 35.9MiB/s]
 74%|████████████████████████████▊          | 340M/461M [00:09<00:03, 36.0MiB/s]
 75%|█████████████████████████████          | 344M/461M [00:09<00:03, 38.2MiB/s]
 76%|█████████████████████████████▍         | 348M/461M [00:09<00:03, 39.1MiB/s]
 76%|█████████████████████████████▊         | 352M/461M [00:09<00:02, 40.1MiB/s]
 77%|██████████████████████████████         | 356M/461M [00:10<00:02, 39.7MiB/s]
 78%|██████████████████████████████▍        | 360M/461M [00:10<00:02, 39.3MiB/s]
 79%|██████████████████████████████▊        | 364M/461M [00:10<00:02, 36.9MiB/s]
 80%|███████████████████████████████        | 367M/461M [00:10<00:02, 37.1MiB/s]
 80%|███████████████████████████████▍       | 371M/461M [00:10<00:02, 37.6MiB/s]
 81%|███████████████████████████████▋       | 375M/461M [00:10<00:02, 38.9MiB/s]
 82%|████████████████████████████████       | 379M/461M [00:10<00:02, 35.2MiB/s]
 83%|████████████████████████████████▎      | 383M/461M [00:10<00:02, 36.8MiB/s]
 84%|████████████████████████████████▋      | 387M/461M [00:10<00:01, 39.0MiB/s]
 85%|█████████████████████████████████      | 391M/461M [00:10<00:01, 40.2MiB/s]
 86%|█████████████████████████████████▍     | 395M/461M [00:11<00:01, 41.4MiB/s]
 87%|█████████████████████████████████▊     | 400M/461M [00:11<00:01, 42.8MiB/s]
 88%|██████████████████████████████████▏    | 404M/461M [00:11<00:01, 42.8MiB/s]
 89%|██████████████████████████████████▌    | 408M/461M [00:11<00:01, 43.7MiB/s]
 89%|██████████████████████████████████▉    | 412M/461M [00:11<00:01, 41.5MiB/s]
 90%|███████████████████████████████████▎   | 417M/461M [00:11<00:01, 42.8MiB/s]
 91%|███████████████████████████████████▋   | 421M/461M [00:11<00:00, 43.8MiB/s]
 92%|████████████████████████████████████   | 426M/461M [00:11<00:00, 45.4MiB/s]
 93%|████████████████████████████████████▍  | 430M/461M [00:11<00:00, 43.8MiB/s]
 94%|████████████████████████████████████▋  | 435M/461M [00:12<00:00, 37.7MiB/s]
 95%|█████████████████████████████████████  | 438M/461M [00:12<00:00, 37.9MiB/s]
 96%|█████████████████████████████████████▍ | 442M/461M [00:12<00:00, 39.2MiB/s]
 97%|█████████████████████████████████████▋ | 446M/461M [00:12<00:00, 35.5MiB/s]
 98%|██████████████████████████████████████ | 450M/461M [00:12<00:00, 36.1MiB/s]
 98%|██████████████████████████████████████▎| 454M/461M [00:12<00:00, 37.0MiB/s]
 99%|██████████████████████████████████████▋| 458M/461M [00:12<00:00, 38.5MiB/s]
100%|███████████████████████████████████████| 461M/461M [00:12<00:00, 37.5MiB/s]

[2023-06-03 15:33:56 +0000] [8] [INFO] Started server process [8]

[2023-06-03 15:33:56 +0000] [8] [INFO] Waiting for application startup.

[2023-06-03 15:33:56 +0000] [8] [INFO] Application startup complete.

/app/.venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead

  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

Jun 03 '23 15:06 themantalope

I managed to get it working be adding "--runtime nvidia" and "-e NVIDIA_VISIBLE_DEVICES=all"

Jun 04 '23 20:06 qjao

Which image did you use, @qjao ?

Jun 04 '23 20:06 themantalope

Which image did you use, @qjao ?

onerahmet/openai-whisper-asr-webservice:latest-gpu

Jun 04 '23 20:06 qjao

Thanks, but no luck.

@ahmetoner could you share which CUDA chips you've tested this build with? I'm wondering if I would need to build with a different cuda version. I'm on a machine with an NVIDIA Titan RTX card and a GeForce GTX 1070.

Jun 04 '23 20:06 themantalope

Having the same issue with Cuda 12.2. Can't seem to get this to use the GPU. Very cool project. CPU seems to work, though I have noticed a lot of words don't get translated correctly, and sentences between people all seem to run together. I'll test the debug image and see if that helps

Sep 29 '23 21:09 smitty-nieto

Having the same issue with Cuda 12.2. Can't seem to get this to use the GPU. Very cool project. CPU seems to work, though I have noticed a lot of words don't get translated correctly, and sentences between people all seem to run together. I'll test the debug image and see if that helps

Have you had success getting GPU to work with the regular openai/whisper repo?

Sep 29 '23 21:09 ayancey

Thanks, but no luck.

@ahmetoner could you share which CUDA chips you've tested this build with? I'm wondering if I would need to build with a different cuda version. I'm on a machine with an NVIDIA Titan RTX card and a GeForce GTX 1070.

What driver are you on? It's probably due to incompatibility between CUDA and nvidia driver versions. Please refer to this compatibility matrix and make sure your driver is supported. The latest version of this project uses CUDA 11.8. Your driver could be too old or too new to use it.

It works for me using a GeForce GTX 1650 on 525.125.06. Thank you.

Oct 04 '23 21:10 ayancey

I could use the GPUs with the v1.2.0-gpu model. However, it always chose GPU number 0, irrespective of what I tried to specify with --gpus. I tried all, 7, all with -e "NVIDIA_VISIBLE_DEVICES=7", but it was always GPU 0, not 7.

Oct 18 '23 13:10 DavidNemeskey

I could use the GPUs with the v1.2.0-gpu model. However, it always chose GPU number 0, irrespective of what I tried to specify with --gpus. I tried all, 7, all with -e "NVIDIA_VISIBLE_DEVICES=7", but it was always GPU 0, not 7.

Did you get this to work? Sounds like it could be an issue with Docker instead of the ASR container.

Nov 27 '23 09:11 ayancey

I could use the GPUs with the v1.2.0-gpu model. However, it always chose GPU number 0, irrespective of what I tried to specify with --gpus. I tried all, 7, all with -e "NVIDIA_VISIBLE_DEVICES=7", but it was always GPU 0, not 7.

Did you get this to work? Sounds like it could be an issue with Docker instead of the ASR container.

Yes, turns out you have to write it in the format of --gpus \"device=5\". It would be great if the example on the main page would include such an example, because it is not as trivial as --gpus all.

Nov 29 '23 10:11 DavidNemeskey

FYI for Docker Compose users, just add runtime: nvidia to the service's compose file let it use GPUs.

May 10 '24 17:05 meonkeys

FYI for Docker Compose users, just add runtime: nvidia to the service's compose file let it use GPUs.

This didn't fix the problem on my machine. Running a P2000, Nvidia Driver Version: 550.127.05, CUDA Version: 12.4

Oct 31 '24 00:10 shiftylilbastrd