BBC-Esq

Results 249 comments of BBC-Esq

@signalprime It would take someone with more programming experience than me to implement, especially since I don't own a Mac, but thought I'd start the discussion anyways. Interested as always...

UPDATE: Looks like Pytorch might be getting support sooner than later... https://github.com/pytorch/pytorch/commit/53bfae2c066fcd06784dfa051cd7e2eb5ba5c8fa

> I'm definitely looking into it. Reviewing the Vocos model today I'd love to learn if you want to keep me posted and teach me along the way, just FYI....

> Good thing I waited. I got a response that it should be possible [using existing ops](https://github.com/ml-explore/mlx-examples/issues/206#issuecomment-1962829362). > > Here is the [whisper model in MLX](https://github.com/ml-explore/mlx-examples/blob/47dd6bd17f3cc7ef95672ea16e443e58ce5eb1bf/whisper/whisper/whisper.py) format, which is used...

Not sure if it's relevant, but apparently aten::Lupsample_linear1d has been implemented on pytorch's working version (not included in a release yet though): https://github.com/pytorch/pytorch/pull/116630#issuecomment-1965380887

@signalprime how's it going? Any updates?

Hey @signalprime I hope you don't stop working on this kind of stuff even if you don't get the job with Collabora. I enjoy working with ya and look forward...

On my RTX 4090 I did some basic tests in terms of memory usage, and the quality was about the same as Bark, so maybe that'll help a little. https://github.com/collabora/WhisperSpeech/issues/68#issuecomment-1917828974...

Yeah, and it'd be hard though because audio is much more subjective... The voice cloning seems subjective to a certain extent, but I suppose you could try to prove it...