OpenVoice Nothing alike

Not sure what's happening here - I managed to spin this up in the local gradio app, recorded my own voice, but inference gave me an american-sounding output - I'm British - is that expected?

Thanks!

Jan 03 '24 22:01 chrisbward

I tried mimicking different voices on my microphone, but the output is always the same - is the app broken?

Jan 03 '24 23:01 chrisbward

Okay, so a little further investigation;

I have copied demo_speaker1.mp3 to someother.mp3 and dragged this in to Gradio - and the voice is not cloned.

If I drag demo_speaker1.mp3 in, it works fine - so I do not think there is any inference happening at all. It's determined by filename.

Jan 03 '24 23:01 chrisbward

I tried this in collab and did several recordings. Result: It is not good... at all, I mean it is VERY bad (in terms of likeness). Sorry, but it seems to be very biased to certain voices that it can copy.

Just another overhyped no-good model :(

Jan 04 '24 00:01 pixelass

I think there are bugs in the gradio app and the cloning is not attempted at all.

Jan 04 '24 00:01 chrisbward

oh, no, it is definitely using the audio (at least in collab) (if the "use microphone" checkbox is checked) but the result is nothing like the original voice (it is noticable that it is trying to copy a voice though).

Jan 04 '24 00:01 pixelass

You'll notice if you switch the files around in the samples folder, it uses the original samples voice, even when removed

Jan 04 '24 00:01 chrisbward

Why the voice in the paper samples sounds perfect, but when I run it locally, it doesn't sound anything like it. StyleTTS 2 is much better.

Jan 04 '24 00:01 basher0

I installed it locally and tried my own voice and got a standard voice instead. I tried the supplied demos, and they all came out exactly the same standard voice I can't figure how to change the accent, there are no example prompts.

Jan 04 '24 02:01 iwoolf

Hi All - Regarding the accent, please read the paper carefully before judging the results. The accent should be controlled by the base speaker model. The tone color converter does not clone your accent. This demo only provides control over emotion, and the accent is default to American accent. The users can use their own base speaker model (British accent) to replace the base speaker model in OpenVoice. The OpenVoice framework provides sufficient flexibility to do it and allows users to use whatever base speaker model they have.

Jan 04 '24 02:01 Zengyi-Qin

ok

Jan 04 '24 03:01 Tpann2518

I have also tried the app as a local install and the voices come back sounding nothing like the reference audio. They all seem to sound like a teenage boy, with slight differences when different references are used. I tried resampling the original audio, in case they needed to be a specific KBit or Hz value, but this did not make any difference. Is there any reason why the results are so different from what I expected from the examples given on the website?

Jan 04 '24 12:01 davechilds

I did the similar experience as everyone mentioned above in the colab, the cloning likeness is not very good (far worse than the demo example, video is here: https://youtu.be/Fx4iiy4eVoM?t=558). I am also wondering if there is anything wrong. I tested OpenAITTS plus RVC before (similar idea behind) but with better result.

For chinese cloning, so far I found BERT-VITS2 is still the best open solution.

Jan 04 '24 17:01 hehuan2363

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

Jan 04 '24 19:01 Zengyi-Qin

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

It have same result

Jan 04 '24 20:01 YKefasu

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

I also tested this. Same inaccuracy on several voices I've tried. None of them come close to the examples.

Jan 05 '24 02:01 pixelass

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

Same result, The output is no where close and sounds childish

Jan 05 '24 11:01 Aegon95

I'm sorry but this model is very inaccurate and bad. Hopefully you update it to significantly improve it as all the voices I tried are bad. Joe Biden, MattVidProAi, MrBeast etc...

Jan 05 '24 15:01 DKRacingFan

I agree the result is very disapointing it almost feels like it is a scam. There are other older opensource models that does it better: https://github.com/coqui-ai/TTS

Jan 06 '24 14:01 RASPIAUDIO

You can find the answers here https://github.com/myshell-ai/OpenVoice/blob/main/QA.md

Jan 07 '24 21:01 Zengyi-Qin

You can find the answers here https://github.com/myshell-ai/OpenVoice/blob/main/QA.md

Evrything seems OK, but I have this warning; not sure if it is a problem ?

(openvoice) PS C:\holobot\prod\OpenVoice> & C:/Users/olivi/anaconda3/envs/openvoice/python.exe c:/holobot/prod/OpenVoice/testopenvoice2.py Loaded checkpoint 'checkpoints/converter/checkpoint.pth' missing/unexpected keys: [] [] [(0.0, 13.33), (13.358, 23.666), (23.886, 38.002), (38.67, 51.144)] after vad: dur = 50.228 C:\Users\olivi\anaconda3\envs\openvoice\lib\site-packages\wavmark\models\my_model.py:25: UserWarning: istft will require a complex-valued input tensor in a future PyTorch release. Matching the output from stft with return_complex=True. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\SpectralOps.cpp:980.) return torch.istft(signal_wmd_fft, n_fft=self.n_fft, hop_length=self.hop_length, window=window,

Jan 08 '24 09:01 RASPIAUDIO

Oh, No。But how to solve this. May give a example in more detail!!

Jan 15 '24 06:01 jackyin68

OpenVoice OpenVoice copied to clipboard

Nothing alike

OpenVoice
OpenVoice copied to clipboard