OpenVoice icon indicating copy to clipboard operation
OpenVoice copied to clipboard

Replacing the base model with something different

Open tushar-31093 opened this issue 2 years ago • 2 comments

Hi Team, Kudos for this project. Tried a few open source ones and this honestly is faster and provides reasonable results. I skimmed through all the open and closed issues and found that people are talking about naturalness etc.

Well, I do know using our model base models can enhance this but is there a way you can point us in that direction as well? Like how to we replace the models and if we can simply use some other base model from some other framework.

This could actually be quite helpful.

Thanks guys. Again. Awesome stuff 👍

tushar-31093 avatar Jan 06 '24 09:01 tushar-31093

You could refer to demo_part2.ipynb where we use an external tts (OpenAI TTS) as the base speaker. It contains 2 steps:

  1. Use this base speaker to read some sentence (>10s recommended), and use se_extractor to extract the tone color vector
  2. After you get the tone color vector, you are ready to use your base speaker model as input. Use your base speaker to read a new sentence, feed this new sentence + the already extracted tone color vector + the tone color vector of the target speaker to the tone color converter. Then you get the results

Highly recommend to refer to demo_part2.ipynb

Zengyi-Qin avatar Jan 07 '24 20:01 Zengyi-Qin

Thanks for the response. So I tried it. I created a voice sample on my custom speech using OpenAI TTS, saved that and used that in the notebook (part 2). Also I followed your notebook itself as is. It did generate something which I didn't understand to be honest. Once it takes in the base speaker voice (wav/mp3) which we can change and I did that and went for the inference to see this. What I wanted to achieve was to replace the base speaker model that is the tone color part apparently for the gradio demo itself and directly use my reference voice on top of that to generate content at scale. I can manage the text length etc for gradio but I am not able to follow the model replacement aspect. Even if we use the notebook which is fine, I am not able to follow the process of using the extracted features in the gradio demo for a better experience.

I understand it is a bit annoying for you to entertain such requests but it's my way of actually saying it's a very good application and I want to use it. A bit of detailed guidance in terms of the process of what to do exactly once we know how it's done using the notebook (part 2) could really be useful.

On Mon, Jan 8, 2024 at 1:49 AM Zengyi Qin @.***> wrote:

You could refer to demo_part2.ipynb where we use an external tts (OpenAI TTS) as the base speaker. It contains 2 steps:

  1. Use this base speaker to read some sentence (>10s recommended), and use se_extractor to extract the tone color vector
  2. After you get the tone color vector, you are ready to use your base speaker model as input. Use your base speaker to read a new sentence, feed this new sentence + the already extracted tone color vector + the tone color vector of the target speaker to the tone color converter. Then you get the results

Highly recommend to refer to demo_part2.ipynb

— Reply to this email directly, view it on GitHub https://github.com/myshell-ai/OpenVoice/issues/68#issuecomment-1880160710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJX575CVBVNY47UL5AITG23YNL7OTAVCNFSM6AAAAABBPNLXLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQGE3DANZRGA . You are receiving this because you authored the thread.Message ID: @.***>

tushar-31093 avatar Jan 08 '24 06:01 tushar-31093