vall-e Training result

Training result

Open yiwei0730 opened this issue 1 year ago • 2 comments

I'd like to inquire about the training results. I have combined datasets AISHELL3, aidata, and a Chinese dataset, totaling 600 hours of training. Although the three audio files are not 24000Hz, I have set cut_set = cut_set.resample(24000) in the line 184 in bin/tokenizer.py, so they should have been converted to 24000Hz. I have followed the document's instructions, using the prefix-1 training method.

python3 bin/trainer.py --world-size 2 --max-duration 80 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 1
--num-buckets 6 --dtype "bfloat16" --save-every-n 10000 --valid-interval 20000
--model-name valle --share-embedding true --norm-first true --add-prenet false
--decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --prefix-mode 1
--base-lr 0.05 --warmup-steps 200 --average-period 0
--num-epochs 20 --start-epoch 1 --start-batch 0 --accumulate-grad-steps 4
--exp-dir ${exp_dir}

Train NAR model cp ${exp_dir}/best-valid-loss.pt ${exp_dir}/epoch-2.pt # --start-epoch 3=2+1

python3 bin/trainer.py --world-size 2 --max-duration 40 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 2
--num-buckets 6 --dtype "float32" --save-every-n 10000 --valid-interval 20000
--model-name valle --share-embedding true --norm-first true --add-prenet false
--decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --prefix-mode 1
--base-lr 0.05 --warmup-steps 200 --average-period 0
--num-epochs 40 --start-epoch 3 --start-batch 0 --accumulate-grad-steps 4
--exp-dir ${exp_dir} But when using the synthesized audio files and synthesizing with unseen data, the following situations occur:

Often the latter part of the prompt appears at the beginning of the synthesized speech.
Synthesizing long sentences leads to repeated or skipped segments in the latter part of the synthesis. Is there any way to improve these situations?"

Aug 10 '23 01:08 yiwei0730

vall-e vall-e copied to clipboard

Training result

vall-e
vall-e copied to clipboard