OpenVoice
OpenVoice copied to clipboard
Nothing alike
Not sure what's happening here - I managed to spin this up in the local gradio app, recorded my own voice, but inference gave me an american-sounding output - I'm British - is that expected?
Thanks!
I tried mimicking different voices on my microphone, but the output is always the same - is the app broken?
Okay, so a little further investigation;
I have copied demo_speaker1.mp3 to someother.mp3 and dragged this in to Gradio - and the voice is not cloned.
If I drag demo_speaker1.mp3 in, it works fine - so I do not think there is any inference happening at all. It's determined by filename.
I tried this in collab and did several recordings. Result: It is not good... at all, I mean it is VERY bad (in terms of likeness). Sorry, but it seems to be very biased to certain voices that it can copy.
Just another overhyped no-good model :(
I think there are bugs in the gradio app and the cloning is not attempted at all.
oh, no, it is definitely using the audio (at least in collab) (if the "use microphone" checkbox is checked) but the result is nothing like the original voice (it is noticable that it is trying to copy a voice though).
You'll notice if you switch the files around in the samples folder, it uses the original samples voice, even when removed
Why the voice in the paper samples sounds perfect, but when I run it locally, it doesn't sound anything like it. StyleTTS 2 is much better.
I installed it locally and tried my own voice and got a standard voice instead. I tried the supplied demos, and they all came out exactly the same standard voice I can't figure how to change the accent, there are no example prompts.
Hi All - Regarding the accent, please read the paper carefully before judging the results. The accent should be controlled by the base speaker model. The tone color converter does not clone your accent. This demo only provides control over emotion, and the accent is default to American accent. The users can use their own base speaker model (British accent) to replace the base speaker model in OpenVoice. The OpenVoice framework provides sufficient flexibility to do it and allows users to use whatever base speaker model they have.
ok
I have also tried the app as a local install and the voices come back sounding nothing like the reference audio. They all seem to sound like a teenage boy, with slight differences when different references are used. I tried resampling the original audio, in case they needed to be a specific KBit or Hz value, but this did not make any difference. Is there any reason why the results are so different from what I expected from the examples given on the website?
I did the similar experience as everyone mentioned above in the colab, the cloning likeness is not very good (far worse than the demo example, video is here: https://youtu.be/Fx4iiy4eVoM?t=558). I am also wondering if there is anything wrong. I tested OpenAITTS plus RVC before (similar idea behind) but with better result.
For chinese cloning, so far I found BERT-VITS2 is still the best open solution.
@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice
@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice
It have same result
@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice
I also tested this. Same inaccuracy on several voices I've tried. None of them come close to the examples.
@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice
Same result, The output is no where close and sounds childish
I'm sorry but this model is very inaccurate and bad. Hopefully you update it to significantly improve it as all the voices I tried are bad. Joe Biden, MattVidProAi, MrBeast etc...
I agree the result is very disapointing it almost feels like it is a scam. There are other older opensource models that does it better: https://github.com/coqui-ai/TTS
You can find the answers here https://github.com/myshell-ai/OpenVoice/blob/main/QA.md
You can find the answers here https://github.com/myshell-ai/OpenVoice/blob/main/QA.md
Evrything seems OK, but I have this warning; not sure if it is a problem ?
(openvoice) PS C:\holobot\prod\OpenVoice> & C:/Users/olivi/anaconda3/envs/openvoice/python.exe c:/holobot/prod/OpenVoice/testopenvoice2.py Loaded checkpoint 'checkpoints/converter/checkpoint.pth' missing/unexpected keys: [] [] [(0.0, 13.33), (13.358, 23.666), (23.886, 38.002), (38.67, 51.144)] after vad: dur = 50.228 C:\Users\olivi\anaconda3\envs\openvoice\lib\site-packages\wavmark\models\my_model.py:25: UserWarning: istft will require a complex-valued input tensor in a future PyTorch release. Matching the output from stft with return_complex=True. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\SpectralOps.cpp:980.) return torch.istft(signal_wmd_fft, n_fft=self.n_fft, hop_length=self.hop_length, window=window,
Oh, No。But how to solve this. May give a example in more detail!!