vall-e issues

Training instructions from README.md not working for me

4

Hello, I am working with an Ubuntu 22, a NVIDIA RTX 3080, 64GB RAM I followed the steps of the DEMO in the README.md to train a model of LibriTTS....

davidmartinrius

G2p bigcidian

1

### Add bilingual Mandarin and English support using BigCiDian 1. Add a phonemize backend *G2PBackend* 2. Import compiled [BigCiDian](https://github.com/speechio/BigCiDian) to g2p to support bilingual Mandarin and English 3. Add *userdict.txt*...

sanqianyuejia

BREAK CHANGES

### 4.14 - https://github.com/lifeiteng/vall-e/pull/85 Refactored TextTokenizer - [code change](https://github.com/lifeiteng/vall-e/pull/85/files#diff-db0bfc2a9604102b98361aae3174bd5d2e7027e44bebf3d592e16a6f4d543581R152) and [test](https://github.com/lifeiteng/vall-e/pull/85/files#diff-91b6947dde6b1a2132060367c398eab274c2c45382591f46f5088eebe8fe733eR28) - before `two -> t u ː` after `two -> t uː` ### 4.xx

lifeiteng

Which is better, training all modules in one stage or training stage 1 and stage 2 separately?

There is a 'train-stage' option in trainer.py In egs/libritts, there is two training precedures with different 'train-stage' options. Which is better in terms of synthesis results?

zhouyong64

Inference

3

The results of inference are not the same with the same config!

ghost

AISHELL1 with cut_set.normalize_loudness

I used cut_set.normalize_loudness because the loudness of aishell audio files is small, https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173 ``` if args.prefix == "aishell": # NOTE: the loudness of aishell audio files is around -33 #...

lifeiteng

Question about loss calculation in AR model without handling mask

1

Regarding the loss calculation part of the AR model, why isn't the mask being handled? ``` total_loss = F.cross_entropy(logits, targets, reduction=reduction) ``` Normally, shouldn't it be: ``` total_loss = F.cross_entropy(logits.mask_selected(y_mask),...

hertz-pj

libritts: AssertionError: No recordings left after fixing the manifests.

When I'm preparing datasets of libritts, I run into this issue: `Scanning audio files (*.wav): 0it [00:00, ?it/s] Preparing LibriTTS parts: 71%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 5/7 [00:00

Yixing-Li

add_prenet=False？

Why should add_prenet be set to false? If it is not set to True, false is indeed better after the experiment, but I do not understand why, can you help...

yangyyt

推理报错

选择自己的音频作为prompt进行推理时，会出现 raise SyntaxError( SyntaxError: well trained model shouldn't reach here.的错误

decajcd

vall-e
vall-e copied to clipboard

Metadata

Training instructions from README.md not working for me

G2p bigcidian

BREAK CHANGES

Which is better, training all modules in one stage or training stage 1 and stage 2 separately?

Inference

AISHELL1 with cut_set.normalize_loudness

Question about loss calculation in AR model without handling mask

libritts: AssertionError: No recordings left after fixing the manifests.

add_prenet=False？

推理报错

← Metadata

Owner

Metadata

vall-e vall-e copied to clipboard

Metadata

← Metadata

Owner

Metadata

vall-e
vall-e copied to clipboard