BBC-Esq comments

Results 249 comments of


                                            BBC-Esq

possibly use MLX for MacOS users with WhisperSpeech

@signalprime It would take someone with more programming experience than me to implement, especially since I don't own a Mac, but thought I'd start the discussion anyways. Interested as always...

possibly use MLX for MacOS users with WhisperSpeech

UPDATE: Looks like Pytorch might be getting support sooner than later... https://github.com/pytorch/pytorch/commit/53bfae2c066fcd06784dfa051cd7e2eb5ba5c8fa

possibly use MLX for MacOS users with WhisperSpeech

> I'm definitely looking into it. Reviewing the Vocos model today I'd love to learn if you want to keep me posted and teach me along the way, just FYI....

possibly use MLX for MacOS users with WhisperSpeech

Interesting...

possibly use MLX for MacOS users with WhisperSpeech

> Good thing I waited. I got a response that it should be possible [using existing ops](https://github.com/ml-explore/mlx-examples/issues/206#issuecomment-1962829362). > > Here is the [whisper model in MLX](https://github.com/ml-explore/mlx-examples/blob/47dd6bd17f3cc7ef95672ea16e443e58ce5eb1bf/whisper/whisper/whisper.py) format, which is used...

possibly use MLX for MacOS users with WhisperSpeech

Not sure if it's relevant, but apparently aten::Lupsample_linear1d has been implemented on pytorch's working version (not included in a release yet though): https://github.com/pytorch/pytorch/pull/116630#issuecomment-1965380887

possibly use MLX for MacOS users with WhisperSpeech

@signalprime how's it going? Any updates?

possibly use MLX for MacOS users with WhisperSpeech

Hey @signalprime I hope you don't stop working on this kind of stuff even if you don't get the job with Collabora. I enjoy working with ya and look forward...

What is the performance of WhisperSpeech?

On my RTX 4090 I did some basic tests in terms of memory usage, and the quality was about the same as Bark, so maybe that'll help a little. https://github.com/collabora/WhisperSpeech/issues/68#issuecomment-1917828974...

What is the performance of WhisperSpeech?

Yeah, and it'd be hard though because audio is much more subjective... The voice cloning seems subjective to a certain extent, but I suppose you could try to prove it...