whisperX
whisperX copied to clipboard
[Feature Request] Support M1 Mac's GPU
If I pass in mps
to device option it will crush. Would be wonderful if M1 GPU can be supported
❯ whisperx assets/test.mp3 --device mps --model large-v2 --vad_filter --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --diarize --hf_token token --language en
torchvision is not available - cannot save figures
Performing VAD...
~~ Transcribing VAD chunk: (01:07:46.006 --> 01:07:50.663) ~~
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1] 97334 abort whisperx assets/test.mp3 --device mps --model large-v2 --vad_filter en
/opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
ye, i got this error too, it seems we need to wait update from pytorch
Hi, I believe PyTorch has support for most of the functions now. Plus it can be run with env arg PYTORCH_ENABLE_MPS_FALLBACK=1 for those functions that aren't supported yet to fall back on the CPU.
I'm running pyannote and other projects with PyTorch compiled with support for mps so this should also be do-able
@Harith163 That is correct. Pytorch has implemented "mps"(Metal Performance Shaders) for a at least a year now, Open-AI's Whisper supports "mps" as well but faster-whisper used by WhisperX apparently only supports "cpu" and "gpu".
I myself get "unsupported device mps", here is the error:
--> 128 self.model = ctranslate2.models.Whisper( 129 model_path, 130 device=device, 131 device_index=device_index, 132 compute_type=compute_type, 133 intra_threads=cpu_threads, 134 inter_threads=num_workers, 135 )
For me with Mac M1 "cpu" is extremely slow to the point I have not been able to get e proper transcription.
Any workaround to the issue? I believe this is essential for mac users.
Thanks :)
@Harith163 That is correct. Pytorch has implemented "mps"(Metal Performance Shaders) for a at least a year now, Open-AI's Whisper supports "mps" as well but faster-whisper used by WhisperX apparently only supports "cpu" and "gpu".
I myself get "unsupported device mps", here is the error:
--> 128 self.model = ctranslate2.models.Whisper( 129 model_path, 130 device=device, 131 device_index=device_index, 132 compute_type=compute_type, 133 intra_threads=cpu_threads, 134 inter_threads=num_workers, 135 )
For me with Mac M1 "cpu" is extremely slow to the point I have not been able to get e proper transcription.
Any workaround to the issue? I believe this is essential for mac users.
Thanks :)
Whisper.cpp is fast on Apple Silicon ("Plain C/C++ implementation without dependencies" … "optimized via ARM NEON, Accelerate framework, Metal and Core ML"). However, I believe it only supports very rudimentary diarization currently.
Ideally, WhisperX's solutions for diarization, etc, could be made to work in the fashion of Whisper.cpp.
@Harith163 That is correct. Pytorch has implemented "mps"(Metal Performance Shaders) for a at least a year now, Open-AI's Whisper supports "mps" as well but faster-whisper used by WhisperX apparently only supports "cpu" and "gpu". I myself get "unsupported device mps", here is the error:
--> 128 self.model = ctranslate2.models.Whisper( 129 model_path, 130 device=device, 131 device_index=device_index, 132 compute_type=compute_type, 133 intra_threads=cpu_threads, 134 inter_threads=num_workers, 135 )
For me with Mac M1 "cpu" is extremely slow to the point I have not been able to get e proper transcription. Any workaround to the issue? I believe this is essential for mac users. Thanks :)Whisper.cpp is fast on Apple Silicon ("Plain C/C++ implementation without dependencies" … "optimized via ARM NEON, Accelerate framework, Metal and Core ML"). However, I believe it only supports very rudimentary diarization currently.
Ideally, WhisperX's solutions for diarization, etc, could be made to work in the fashion of Whisper.cpp.
That's only ideally, whisper.cpp's creator showed interest in the killer features of WhisperX but stated they are not coming any time soon.
I'd rather fix whisperx to work better on M1/Apple Silicon
Just wanted to second this. I love whisperx on my PC, but on Mac it is just so slow.
Its resulted in fragmentation where if I want my script to be universal I have to look elsewhere. Really wish this could be supported.
Any progress on this so far?
Any news ?
Any news?
It could be that ctranslate2 needs to be built on your macOS system with WITH_ACCELERATE set to ON. The owner of ctranslate2 says is may be possible here, in reference to this comment.
Thanks for the tip! ctranslate2 install instructions are here: https://opennmt.net/CTranslate2/installation.html#install-from-sources
On my macOS, I needed to use the following for the cmake
step:
cmake -DWITH_ACCELERATE=ON -DOPENMP_RUNTIME=COMP -DWITH_MKL=OFF ..
After compiling and installing ctranslate2 myself in a fresh pyenv with MPS enabled on my M1 Mac, I installed whisperx (which used my existing ctranslate2 install: Requirement already satisfied: ctranslate2<5,>=4.0 in /Users/ryan/.pyenv/versions/3.12.4/envs/ctranslate-build/lib/python3.12/site-packages (from faster-whisper==1.0.0->whisperx==3.1.1) (4.4.0)
) and ran with --device mps
but still got:
Traceback (most recent call last):
File "/Users/ryan/.pyenv/versions/ctranslate-build/bin/whisperx", line 8, in <module>
sys.exit(cli())
^^^^^
File "/Users/ryan/.pyenv/versions/3.12.4/envs/ctranslate-build/lib/python3.12/site-packages/whisperx/transcribe.py", line 170, in cli
model = load_model(model_name, device=device, device_index=device_index, download_root=model_dir, compute_type=compute_type, language=args['language'], asr_options=asr_options, vad_options={"vad_onset": vad_onset, "vad_offset": vad_offset}, task=task, threads=faster_whisper_threads)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ryan/.pyenv/versions/3.12.4/envs/ctranslate-build/lib/python3.12/site-packages/whisperx/asr.py", line 288, in load_model
model = model or WhisperModel(whisper_arch,
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ryan/.pyenv/versions/3.12.4/envs/ctranslate-build/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 133, in __init__
self.model = ctranslate2.models.Whisper(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unsupported device mps
After looking at how WITH_ACCELERATE
and devices are implemented in CTranslate2 itself, it seems that enabling MPS allows the "cpu" device backend to use MPS accleration, you can't use it yet as a separate "mps" device. So after compiling/installing CTranslate2 from source as above, then installing whisperx, invoke whisperx with:
--device cpu --compute_type float32
And it should use MPS acceleration for matrix multiplication.
NB: I only get about a 10-15% reduction in overall processing times from my experimentation on an M1 Max. Possible that other chips get better results, or that the matrix multiplication operations offloaded to MPS aren't bottlenecking the overall process, or that the CPU/MPS handoff negates some possible performance gains.
You may also want to instead install CTranslate2 from source with the cmake
step:
cmake -DCMAKE_OSX_ARCHITECTURES=arm64 -DWITH_ACCELERATE=ON -DWITH_MKL=OFF -DOPENMP_RUNTIME=COMP -DWITH_RUY=ON ..
Which should give you a fallback to let you do --compute_type int8
matrix multiplication on CPU without MPS (though I haven't had the time to create a new pyenv to test this yet).
After looking at how
WITH_ACCELERATE
and devices are implemented in CTranslate2 itself, it seems that enabling MPS allows the "cpu" device backend to use MPS accleration, you can't use it yet as a separate "mps" device. So after compiling/installing CTranslate2 from source as above, then installing whisperx, invoke whisperx with:--device cpu --compute_type float32
And it should use MPS acceleration for matrix multiplication.
NB: I only get about a 10-15% reduction in overall processing times from my experimentation on an M1 Max. Possible that other chips get better results, or that the matrix multiplication operations offloaded to MPS aren't bottlenecking the overall process, or that the CPU/MPS handoff negates some possible performance gains.
You may also want to instead install CTranslate2 from source with the
cmake
step:cmake -DCMAKE_OSX_ARCHITECTURES=arm64 -DWITH_ACCELERATE=ON -DWITH_MKL=OFF -DOPENMP_RUNTIME=COMP -DWITH_RUY=ON ..
Which should give you a fallback to let you do
--compute_type int8
matrix multiplication on CPU without MPS (though I haven't had the time to create a new pyenv to test this yet).
got this error now:
[ 38%] Linking CXX shared library libctranslate2.dylib Undefined symbols for architecture arm64: "std::exception_ptr::__from_native_exception_pointer(void*)", referenced from: std::__1::promise<ctranslate2::TranslationResult>::~promise() in buffered_translation_wrapper.cc.o std::__1::promise<ctranslate2::EncoderForwardOutput>::~promise() in encoder.cc.o std::__1::promise<ctranslate2::GenerationResult>::~promise() in generator.cc.o std::__1::promise<ctranslate2::ScoringResult>::~promise() in generator.cc.o std::__1::promise<ctranslate2::StorageView>::~promise() in generator.cc.o std::__1::promise<ctranslate2::StorageView>::~promise() in wav2vec2.cc.o std::__1::promise<ctranslate2::StorageView>::~promise() in whisper.cc.o ... "___cxa_init_primary_exception", referenced from: std::__1::promise<ctranslate2::TranslationResult>::~promise() in buffered_translation_wrapper.cc.o std::__1::promise<ctranslate2::EncoderForwardOutput>::~promise() in encoder.cc.o std::__1::promise<ctranslate2::GenerationResult>::~promise() in generator.cc.o std::__1::promise<ctranslate2::ScoringResult>::~promise() in generator.cc.o std::__1::promise<ctranslate2::StorageView>::~promise() in generator.cc.o std::__1::promise<ctranslate2::StorageView>::~promise() in wav2vec2.cc.o std::__1::promise<ctranslate2::StorageView>::~promise() in whisper.cc.o ... ld: symbol(s) not found for architecture arm64 clang++: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: *** [libctranslate2.4.4.0.dylib] Error 1 make[1]: *** [CMakeFiles/ctranslate2.dir/all] Error 2 make: *** [all] Error 2
I found the diarization function also need to be changed to support MPS, in transcribe.py, change the following code
diarize_model = DiarizationPipeline(use_auth_token=hf_token, device=device)
to
diarize_model = DiarizationPipeline(use_auth_token=hf_token, device='mps')
Was anyone able to run it on mac? @colin4k with your latest comment on changing the diarization fuction, were you able to make it work?
Was anyone able to run it on mac? @colin4k with your latest comment on changing the diarization fuction, were you able to make it work?
I can run it on mac, but the speed of transcribing is very slow.
Was anyone able to run it on mac? @colin4k with your latest comment on changing the diarization fuction, were you able to make it work?
And after I changed the code of diarization function, the speed of diarization changed fastly.