WhisperSpeech CPU + MPS Support

Hi Do you know if CPU and MPS support is on the roadmap? Thanks!

Jan 19 '24 19:01 fakerybakery

CPU could be supported through whisper.cpp/llama.cpp but we are not working on that right now. MPS should work with minimal tweaks (there may be some hardcoded “cuda” settings).

Jan 20 '24 17:01 jpc

Nice, thanks. Do you know how much work it would take to get WhisperSpeech working with whisper.cpp?

Jan 20 '24 22:01 fakerybakery

Adding my vote for MPS support: I'd love to use this on Macs and iOS devices.

Jan 21 '24 17:01 DePasqualeOrg

Not sure if you can run Python on iOS w/o iSH

Jan 21 '24 18:01 fakerybakery

@fakerybakery You can try and report back how difficult it is :)

I don't have this on my roadmap right now (I am mostly focused on improving quality and language coverage right now) but, if someone needs this, a consulting contract is a very effective way to make sure it happens.

Jan 22 '24 12:01 jpc

Would be great if someone add MPS support. Can't run this on Mac, and mac's are quite often used with LLMs now

Jan 26 '24 17:01 Grzegosz

CPU could be supported through whisper.cpp/llama.cpp but we are not working on that right now. MPS should work with minimal tweaks (there may be some hardcoded “cuda” settings).

I might take this one on...but first please see my recent issue about pull requests and whether you're open to source code modifications without me using a Jupyter Notebook...unless someone wants to show me how.

Basically, I'd be considering tackling:

ensuring AMD GPU-acceleration on Linux via rocM (unfortunately, pytorch doesn't support AMD GPUs on Windows) ---This should involve minimal changes since it uses the "cuda" device within the pytorch framework, so it'd just be a matter of doublechecking the code actually for minor changes.
ensuring MPS support, which, again, involves minor changes (adding "mps" as a vible device within pytorch).
likely adding source code-wide changes to use "cuda," "mps" or "cpu" as the default compute device depending on a user's system.

Feb 03 '24 12:02 BBC-Esq

Just left a response on https://github.com/collabora/WhisperSpeech/issues/73 would be great to have MPS support.

Feb 03 '24 16:02 zoq

@BBC-Esq we are using nbdev. it allows you to edit either the notebooks or the .py files and later synchronize the changes.

I am on holiday next week but afterwards I am happy to either help you setup nbdev or if you make a PR I can merge your changes back into the notebooks.

Feb 03 '24 20:02 jpc

modifying Whisperspeech to run on torch MPS backend was not so hard - just replaced .cuda() with .to("mps"), added map_location='mps' to couple torch.loads and removed 'with sdp_kernel' lines. But i hit some problem with vocoder - MPS doesnt have real x complex GEMMs(some assert) and no complex.out is implemented for MPS so need a little bit of help here

Feb 05 '24 15:02 akorzh

Here's the pull request I did as well. Want to work together on this? https://github.com/collabora/WhisperSpeech/pull/77 I'm not that familiar with github, but I think there's a way to work together on a pull request?

Feb 05 '24 17:02 BBC-Esq

did you get it working - i did more changes and still wasnt able to run inference exampl notebook BTW all those .py files are generated from notebooks, so need to modify those as well

Feb 05 '24 17:02 akorzh

No, the pull request was simply to show an example of choosing between "cuda," "mps" or "cpu" based on the get_compute_device function within utils.py. I was hoping to get feedback as far as that approach in general (a function that dynamically determines the compute device) before modifying the other scripts. Multiple other scripts will need to be modified to set the appropriate compute device dynamically if the developer approves this approach, basically.

Also, now we're aware of the issue that you raised regarding vocoder above. Was hoping to get the "go ahead" beforehand, basically. If you want to work on this together, I'm assuming we'd work on the branch I created (from which the pull request came from)? Kind of new to github...

Feb 05 '24 17:02 BBC-Esq

@jpc What did you think of the draft pull request. Am I on the right track and do you want me to work on modifying the other scripts as well?

Feb 05 '24 17:02 BBC-Esq

Regarding Vocos and MPS maybe it would be worth raising an issue on their GitHub and see what the author says? I was using this model as-is so I am unfortunately not familiar with its internals.

If this does not help I can try looking into this next week.

Feb 05 '24 18:02 jpc

The sdp_kernel is kind of important for performance on CUDA so we’d have to figure out how to make them transparent for MPS. Maybe make a new context manager that wraps the one from PyTorch?

Feb 05 '24 18:02 jpc

I'll do what I can on the draft pull request, but others will likely have to help since I don't have a MacOS to test on...I can at least get the overall framework there in terms of dynamically choosing the compute deivce across all scripts...

Feb 05 '24 18:02 BBC-Esq

ok i got it to work on Mac but had to move vocoder and encoder to cpu. MPS lacks support for The operator 'aten::complex.out' is not currently implemented for the MPS device. The operator 'aten::_fft_r2c' is not currently implemented for the MPS device

Feb 06 '24 03:02 akorzh

Excellent, so we've whittled it down. Can you send a screen shot of trying to put it via mps anyways? That way I can see what the error says and try to troubleshoot. But with my revised scripts (i.e. draft pull request) MPS works for everything except the vocoder? Thanks.

Feb 06 '24 04:02 BBC-Esq

I was able to find this. https://qqaatw.dev/pytorch-mps-ops-coverage/ I couldn't find fft_r2c on there though.

Feb 06 '24 14:02 BBC-Esq

sorry i didn't use your pull request, just some hacked together code(which is quite similar but in more places). Need to have something working first i thought. Haven't you tried running on MPS yourself? i posted couple of requests to https://github.com/pytorch/pytorch/issues/77764

Feb 06 '24 15:02 akorzh

Unfortunately I don't have an Apple computer...nor Linux for that matter. That's an extreme challenge when trying to write code that works with all three platforms for sure. I was able to find these links, however:

https://github.com/pytorch/pytorch/pull/116630 https://developer.apple.com/documentation/metal/metal_sample_code_library/customizing_a_pytorch_operation https://github.com/neuraloperator/neuraloperator

Not sure if they'll help.

My draft pull request has all the basic infrastructure there though, suppose we could modify it to exclude the vocoder from being loaded on MPS alone, but I'd like to hear back from the repository owner if he can confirm that you've said so we know for certain ya know?

Feb 06 '24 15:02 BBC-Esq

I was thinking about writing to the Vocos author since I believe sometimes the offending operations can be changed to something a little bit different that works out of the box on MPS.

Feb 06 '24 18:02 jpc

Do it! @akorzh do you have the script you used? Might help me troubleshoot.

Feb 07 '24 15:02 BBC-Esq

@jpc A few possible workarounds if we can't find a way to get vocos working on MPS out of the box...

Manually implement the GEMMs or specific FFT operations using MPS primitives.
Decompose the unsupported operations into smaller supported operations.
Context Manager to automatically move operations to CPU/MPS when appropriate to ensure that as much as possible will run on MPS.
Write custom kernels in the metal shading language and invoke them from python with PyObjC.
Evaluate how MPS Graph within Core ML might help.
Possibly use SYCL and DPC++ to write code that is portable across different GPU architectures, including potentially targeting Metal through an abstraction layer. Primarily designed for CUDA and OpenCL, could potentially be adapted to generate MSL code that runs on MPS.
Using OpenCL/GL instead of MPS as a fallback rather than falling back to the CPU.

Thoughts anyone?

Feb 07 '24 15:02 BBC-Esq

Another option might be to use Vulkan. Llama.cpp just implemented a Vulkan backend, one version from gpt4All and another from another guy (forget his named). This would also allow gpu acceleration with AMD gpus on Windows and, according to the following link, on MacOS as well:

https://github.com/KhronosGroup/MoltenVK

Feb 07 '24 16:02 BBC-Esq

https://github.com/KhronosGroup/MoltenVK/issues/2154

Feb 07 '24 16:02 BBC-Esq

@jpc and @akorzh I think I may have found a solution. MLX for MacOS? Here's the operations it supports:

Here's the website link:

https://ml-explore.github.io/mlx/build/html/python/fft.html https://github.com/ml-explore/mlx

Take it with a grain of salt, but here's what gpt-4 says...so there might be an option optimized for apple already...I leave it to your expertise:

WhisperSpeech WhisperSpeech copied to clipboard

CPU + MPS Support

WhisperSpeech
WhisperSpeech copied to clipboard