vall-e icon indicating copy to clipboard operation
vall-e copied to clipboard

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Results 32 vall-e issues
Sort by recently updated
recently updated
newest added

I trained vall-e on LibriTTS about 100 epochs (took almost 4 days on 8 A100 GPUs) and I obtained plausible synthesized audio. Here is a demo. [1] prompt : [prompt_link](https://drive.google.com/file/d/149pHqb6TZzVwhF1vRN50H8A4AEYShpfp/view?usp=share_link)...

Could you please provide me with the specific parameter configurations in the command for training the LJSpeech dataset? Like this: python3 bin/trainer.py --max-duration 80 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 1...

So when I tried to tokenized the wenetspeech, I got RuntimeError: CUDA out of memory. Is there any possible for on-the-fly?

File "/home/twlan/anaconda3/envs/valle/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/twlan/anaconda3/envs/valle/lib/python3.8/site-packages/encodec/modules/seanet.py", line 63, in forward return self.shortcut(x) + self.block(x) File "/home/twlan/anaconda3/envs/valle/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/twlan/anaconda3/envs/valle/lib/python3.8/site-packages/torch/nn/modules/module.py",...

Hi, I tried to train on my dataset, but I seem to have an abnormal loss curve. Do you have any suggestions? Thanks. ----------------------------------------------------- The loss of AR: https://drive.google.com/file/d/1-gZJX-mwYZ-2vkKTl0dTwBcp1A8MHrmV/view?usp=drive_link ![image](https://github.com/lifeiteng/vall-e/assets/37279265/11e78b66-f71c-4039-b495-73b191fac760)...

I get an error like this: ``` 2023-10-19 10:10:09,510 INFO [infer.py:224] synthesize text: Selamat pagi 2023-10-19 10:10:09,513 WARNING [words_mismatch.py:88] words count mismatch on 500.0% of the lines (5/1) 2023-10-19 10:10:09,516...

FYI: I build one WeChat group for discussing various new speech technologies. Those who are interested can scan the following QR codes with your [WeChat app](https://www.wechat.com/) to join the group....

Hello, I was reading the training instructions (and the prepare dataset scripts) and I don't understand how you'd create and use custom datasets with this model.

I'd like to inquire about the training results. I have combined datasets AISHELL3, aidata, and a Chinese dataset, totaling 600 hours of training. Although the three audio files are not...