StyleTTS2
StyleTTS2 copied to clipboard
Mac (Metal) support?
Any chances of running this model on the unified RAM in silicon macs? 16GB GPU/CPU
Same issue. It doesn't work for now. I've tried HuggingFace space running locally and got:
MPS would be available but cannot be used rn RuntimeError: espeak not installed on your system
@yukiarimo If you just want to do inference, you can install espeak via homebrew: https://formulae.brew.sh/formula/espeak
@yl4579 I have tried and got this error:
(ai) yuki@yuki styletts2 % python app.py
NLTK
[nltk_data] Downloading package punkt to /Users/yuki/nltk_data...
[nltk_data] Package punkt is already up-to-date!
SCIPY
TORCH STUFF
START
177
MPS would be available but cannot be used rn
/Users/yuki/anaconda3/envs/ai/lib/python3.10/site-packages/torch/nn/modules/rnn.py:71: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
warnings.warn("dropout option adds dropout after all but last "
bert loaded
bert_encoder loaded
predictor loaded
decoder loaded
text_encoder loaded
predictor_encoder loaded
style_encoder loaded
diffusion loaded
text_aligner loaded
pitch_extractor loaded
mpd loaded
msd loaded
wd loaded
[nltk_data] Downloading package punkt to /Users/yuki/nltk_data...
[nltk_data] Package punkt is already up-to-date!
177
bert loaded
bert_encoder loaded
predictor loaded
decoder loaded
text_encoder loaded
predictor_encoder loaded
style_encoder loaded
diffusion loaded
text_aligner loaded
pitch_extractor loaded
mpd loaded
msd loaded
wd loaded
Traceback (most recent call last):
File "/Users/yuki/Downloads/styletts2/app.py", line 105, in <module>
btn.click(synthesize, inputs=[inp, voice, multispeakersteps], outputs=[audio], concurrency_limit=4)
TypeError: EventListenerMethod.__call__() got an unexpected keyword argument 'concurrency_limit'
I didn't know this "app.py". Maybe you can ask it in the repo that made this?
https://huggingface.co/spaces/styletts2/styletts2/tree/main
@yukiarimo Please ask @fakerybakery
@fakerybakery have you looked into mlx? It's a new framework from Apple. They have a separate repo for examples.
They've designed it to closely follow PyTorch's implementation-- though I'm not sure exactly what this means in terms of interop. Still worth some attention!
@fakerybakery I've tried your tutorial, but I am getting the error: "espeak is not found on your system". Any ideas?
Hi, just forgot yes I had a similar issue. I’ll check my env to see what I did to fix it and get back to you. Sorry about the delay!
@yukiarimo Did you successfully install the espeak-ng in with MacPorts? Can you try running:
echo 'this is a test' | espeak-ng -x -q --ipa -v en-us
@fakerybakery Yes, it's working
Output: ðɪs ɪz ɐ tˈɛst
Can you try running brew install espeak
?
Also, try setting the PHONEMIZER_ESPEAK_PATH
env variable to the path of your espeak-ng installation (not the binary, the installation) and PHONEMIZER_ESPEAK_LIBRARY
to the binary
If you don't know the installation path, try setting PHONEMIZER_ESPEAK_LIBRARY=/opt/local/bin/espeak-ng
Hi @yukiarimo!
Part 1: Installing
espeak-ng
First, espeak on mac is a bit tricky to install and get working with phonemizer. Here's how I got it working:
Install MacPorts (Brew is better, but doesn't work w/ espeak-ng)
Install
espeak-ng
through MacPorts:sudo port install espeak-ng
Phonemizer will give an error about
phontab
or missing data. You can resolve this by:
- Opening the
~/.zshrc
file- Adding
export ESPEAK_DATA_PATH="/opt/local/share/espeak-ng-data"
to the end of the file on a separate lineNow Phonemizer should work on Mac.
Part 2: Resolving
concurrency_limit
The issue with
concurrency_limit
is actually not an issue with MPS/Metal. It's an issue with the Web UI framework used for this demo, Gradio. Try runningpip install -U gradio
Part 3: About Metal/MPS
I tried modifying StyleTTS 2 to work with mps a couple weeks ago. It didn't work. PyTorch does not yet have full support for MPS so some features StyleTTS 2 required are still unavailable on MPS for PyTorch.
However, StyleTTS 2 is so fast that you don't really need MPS support. Even on CPU, it only takes a few seconds to generate relatively long text.
I hope PyTorch adds these features soon, however it currently looks like it will be a while before they're available.
Also, sorry about the weird messages ("MPS would be available but cannot be used rn", "torch stuff", etc) - I was testing the code and forgot to remove the weird notes that probably makes everything confusing.
Great rundown of the MPS and the related gradio web UI issue @fakerybakery! 🙏🏽 🙌🏽
It sounds like inference is sufficient for the time being then. @itsPreto makes a great suggestion with mlx. However, It is a fair amount of work to port over to it! 🤔
I do have the same problem here. Installed espeak-ng through macports set the two environment variables like you said: PHONEMIZER_ESPEAK_LIBRARY=/opt/local/bin/espeak-ng PHONEMIZER_ESPEAK_PATH=/opt/local/share/espeak-ng-data and still the app.py errors out with "RuntimeError: espeak not installed on your system" any other suggestions?
Hi, Sorry, I’m out of ideas on this issue :) - Could you try opening an issue on the Phonemizer library?
Hi there. I was playing with this a bit too and had similar issues. This allowed me to work around the espeak issue and run on mps/m1:
PHONEMIZER_ESPEAK_LIBRARY=/opt/homebrew/Cellar/espeak/1.48.04_1/lib/libespeak.dylib python3 styletts2_demo_libritts.py
I also used this command to locate my espeak installation: otool -L $(which espeak) | grep espeak
.
These insights came from reading the issue here.
By the way, I also had to change some lines like ref_tokens = torch.LongTensor(ref_tokens).to(device).unsqueeze(0)
to ref_tokens = torch.LongTensor(ref_tokens).unsqueeze(0)
. These issues are more straightforward than the espeak one.
Hi @mparrett, what issues did you run in to that required you to change this? Just curious, since it seemed to work when I ran it on an M1 Mac.
By the way, I also had to change some lines like
ref_tokens = torch.LongTensor(ref_tokens).to(device).unsqueeze(0)
toref_tokens = torch.LongTensor(ref_tokens).unsqueeze(0)
. These issues are more straightforward than the espeak one.
@fakerybakery Did you set device = 'mps'
with no errors? Otherwise, the example notebook will probably run as-is because it selects cpu if cuda is not available. For me, after setting the device to mps I first ran into some unsupported operation and had to set PYTORCH_ENABLE_MPS_FALLBACK=1
. Then there was another problem with the text encoder model. I decided to exclude it from using the mps
device by modifying this line:
_ = [model[key].to(device) for key in model]
After that I had to make some changes, being careful that input tensors were going on the correct device depending on the model(s) being used, which is the reason for the change I mentioned before.
I noticed a significant speedup (~3s mps vs ~5s cpu inference) but this was a quick hack and could probably be optimized much more for the mac hardware.
Oh, yes, MPS isn't supported yet. I ran it on CPU. Thanks for the tips!
I do have the same problem here. Installed espeak-ng through macports set the two environment variables like you said: PHONEMIZER_ESPEAK_LIBRARY=/opt/local/bin/espeak-ng PHONEMIZER_ESPEAK_PATH=/opt/local/share/espeak-ng-data and still the app.py errors out with "RuntimeError: espeak not installed on your system" any other suggestions?
I am having the same issue.
I do have the same problem here. Installed espeak-ng through macports set the two environment variables like you said: PHONEMIZER_ESPEAK_LIBRARY=/opt/local/bin/espeak-ng PHONEMIZER_ESPEAK_PATH=/opt/local/share/espeak-ng-data and still the app.py errors out with "RuntimeError: espeak not installed on your system" any other suggestions?
So I got it running in the end. The problem seems to have been declaring the environment variables without using export (they showed up through echo, but probably weren't available for python?)
So installing espeak-ng through macports and setting these environment variables made it work:
export PHONEMIZER_ESPEAK_LIBRARY=/opt/local/lib/libespeak-ng.dylib
export PHONEMIZER_ESPEAK_PATH=/opt/local/bin/espeak-ng
sorry for the confusion. my mistake.
@mparrett Could you clarify the changes you made, or share your code regarding the mps
device. (If I'm understanding you, you've successfully run using mps
. Correct?)
Cheers.
@mparrett Could you clarify the changes you made, or share your code regarding the
mps
device. (If I'm understanding you, you've successfully run usingmps
. Correct?)Cheers.
That's right, I got this running with device == 'mps'. Happy to share my branch when I get a moment this weekend. Cheers.
@mparrett Could you clarify the changes you made, or share your code regarding the
mps
device. (If I'm understanding you, you've successfully run usingmps
. Correct?) Cheers.That's right, I got this running with device == 'mps'. Happy to share my branch when I get a moment this weekend. Cheers.
Look forward to that! Thank you!
@mparrett Just checking in re: your mps implementation. Would it be easier just to enumerate the changes here in a comment, or just share a diff?
Apologies for the delay, have been out of town. Getting back tomorrow and will share my branch. Thanks!On Aug 4, 2024, at 07:58, changeling @.***> wrote: @mparrett Just checking in re: your mps implementation. Would it be easier just to enumerate the changes here in a comment, or just share a diff?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
No worries! Just looking forward to seeing your changes. Take your time. 😀
Hi there @changeling ,
Finally got around to pushing these changes. My local repo is a bit of a mess and this wasn't intended for wider consumption. I prepared 3 branches for you to take a look at, with varying levels of granularity and noise :-). I left these un-rebased in case that matters to the final outcome, but they can be rebased without conflicts. Let me know if you have any questions!
https://github.com/mparrett/StyleTTS2/tree/matt-mps-squash https://github.com/mparrett/StyleTTS2/tree/matt-mps-squash-partial https://github.com/mparrett/StyleTTS2/tree/matt-mps