vall-e have someone ever tried this repo on other languages and got good performance？

have someone ever tried this repo on other languages and got good performance？

Open MisakaMikoto96 opened this issue 2 years ago • 5 comments

have someone ever tried this repo on other languages and got good performance？ 50 hours of toy data seem didn't get intelligibility.

Feb 05 '23 05:02 MisakaMikoto96

Hi, @MisakaMikoto96 Sorry but could you get the good result with English? I can not generate audio by using trained model with some data. Result is just noise not voice. Looking forward to your reply. Regards! Petar

Feb 09 '23 03:02 airpdev

Hi, @MisakaMikoto96 Sorry but could you get the good result with English? I can not generate audio by using trained model with some data. Result is just noise not voice. Looking forward to your reply. Regards! Petar

I only tried it on a 1-hour nano mandarin data, and I set the input prompts to be itself(do not use self.sample_prompts in data.py), I got the human voice as its overfit in my dataset (test by input a transcription and its related audio, it is able to reconstruct the audio). The reproduction of @enhuiz ‘s work seems some different from the paper. May I ask why the sample_prompt is in data processing and only choose the qnt not the <phn, qnt>?

And also in the infere nce stage, the paper prefers to input "text_prompt" + "text_to_be_gen" + "audio_prompt", is any explanation in your code？

Really thanks for your work!

Feb 09 '23 03:02 MisakaMikoto96

Hi, @MisakaMikoto96 Thanks for your message. Sorry but would you like to share the code? (data.py, config.py, ar.yml) Looking forward to hearing from you. Best Regards! Petar

Feb 09 '23 12:02 airpdev

i've the same confusion. But i found that this kind of work can let us infer from the target text, but not concat from the prefix text+target text like the described in paper

Aug 01 '23 01:08 skysbird

And also in the infere nce stage, the paper prefers to input "text_prompt" + "text_to_be_gen" + "audio_prompt", is any explanation in your code？

this can be lead a problem when your acoustic prompt are not consistent to the prefix text

Aug 01 '23 01:08 skysbird

vall-e vall-e copied to clipboard

have someone ever tried this repo on other languages and got good performance？

vall-e
vall-e copied to clipboard