metavoice-src icon indicating copy to clipboard operation
metavoice-src copied to clipboard

Please make a simple gradio app that supports text to speech, 0 shot voice cloning, and true training for voice cloning

Open FurkanGozukara opened this issue 1 year ago • 12 comments
trafficstars

It shouldn't be hard for you. It can be ugly looking and bad coded, just works is sufficient

FurkanGozukara avatar Feb 06 '24 22:02 FurkanGozukara

Have you tried ttsdemo.themetavoice.xyz ?

On Tue, Feb 6, 2024 at 10:28 PM Furkan Gözükara @.***> wrote:

It shouldn't be hard for you. It can be ugly looking and bad coded, just works is sufficient

— Reply to this email directly, view it on GitHub https://github.com/metavoiceio/metavoice-src/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPTUD47ETLA7HEUAYZRNKDYSKVBVAVCNFSM6AAAAABC4ZE75WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDCNZXGM4TEMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vatsalaggarwal avatar Feb 06 '24 22:02 vatsalaggarwal

Have you tried ttsdemo.themetavoice.xyz ? On Tue, Feb 6, 2024 at 10:28 PM Furkan Gözükara @.> wrote: It shouldn't be hard for you. It can be ugly looking and bad coded, just works is sufficient — Reply to this email directly, view it on GitHub <#2>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPTUD47ETLA7HEUAYZRNKDYSKVBVAVCNFSM6AAAAABC4ZE75WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDCNZXGM4TEMQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

looking nice but i need source code to run locally

by the way 0 shot is bad as i expected

FurkanGozukara avatar Feb 06 '24 22:02 FurkanGozukara

by the way 0 shot is bad as i expected

Can you share more details - what text(s) did you try, and what voice did you use (preset / custom upload) ?

looking nice but i need source code to run locally You can run locally by doing the following:

  1. setup env, outlined here
  2. run script for local execution, outline here

sidroopdaska avatar Feb 06 '24 23:02 sidroopdaska

by the way 0 shot is bad as i expected

Can you share more details - what text(s) did you try, and what voice did you use (preset / custom upload) ?

looking nice but i need source code to run locally You can run locally by doing the following:

  1. setup env, outlined here
  2. run script for local execution, outline here

hello where is gradio?

i gave this 5 min reference file

http://sndup.net/p2ct

I got much better results with coqui voice cloning

also this is the file it generated with that 5 min reference file

https://sndup.net/r99n/

,I hate that we still cant attach .wav files into github replies

FurkanGozukara avatar Feb 06 '24 23:02 FurkanGozukara

Hi @sidroopdaska, thanks for the amazing project.

I tired zero shot voice cloning with my Indian accent and I could not get the accent right as it sounded more foreign.

Can you please tell more about how to get it right for Indian accents?

Thanks, Rakesh

rozeappletree avatar Feb 07 '24 03:02 rozeappletree

Hey @INF800, we presently support zero shot voice cloning for American & British speakers only. For an indian accent, you will need to finetune. I would recommend 1-5 mins of your voice + LoRA. Let us know if you need any help on getting started with this implementation

sidroopdaska avatar Feb 09 '24 17:02 sidroopdaska

@FurkanGozukara

gradio

https://ttsdemo.themetavoice.xyz/ reference implementation: https://github.com/metavoiceio/metavoice-src/tree/main/fam/ui

could you share the result with xTTS so I can compare?

what do you find lacking in the speech with MetaVoice?

sidroopdaska avatar Feb 09 '24 17:02 sidroopdaska

Hey @INF800, we presently support zero shot voice cloning for American & British speakers only. For an indian accent, you will need to finetune. I would recommend 1-5 mins of your voice + LoRA. Let us know if you need any help on getting started with this implementation.

Definitely yes! If you can tell me how to get started it would be helpful.

rozeappletree avatar Feb 10 '24 07:02 rozeappletree

@sidroopdaska I'd love to train a LORA as well. Please share any relevant pointers on how to get started.

platform-kit avatar Feb 12 '24 03:02 platform-kit

@sidroopdaska I'd love to train a LORA as well. Can't wait to integrate it into our projects. How can I get more help? My email: [email protected]

paliacci avatar Feb 19 '24 19:02 paliacci

I've added some initial pointers to this here: https://github.com/metavoiceio/metavoice-src/issues/70#issuecomment-1957337895

vatsalaggarwal avatar Feb 21 '24 17:02 vatsalaggarwal

Hey @platform-kit / @paliacci /@INF800, we just published an initial approach for finetuning the last N transformer blocks of the first stage LLM. Just a note that it'd be best to play around with the hyperparams in finetune_params.py as we didn't determine optimal params (some people from the community were keen to contribute this portion). Let us know if you have any issues or if you're up for contributing to improving the finetuning (via param sweep or otherwise)!

Next step to improve finetuning effectiveness is to have LoRA adapters for the first stage LLM which is being worked on here.

lucapericlp avatar Mar 14 '24 13:03 lucapericlp