VALL-E-X
VALL-E-X copied to clipboard
Question about text tokenizer
I have noticed that there are two tokenizer dicts in utils/g2p , bpe_1024 and bpe_69, which is more suitable for generation task in your actual training practice? thank you.
bpe_1024.json
is never used in training. It was an experimental and it makes no difference in this project.
bpe_1024.json
is never used in training. It was an experimental and it makes no difference in this project.
Hi, Can you give us some advice on how to make a new bpe_x.json from our own data.
bpe_1024.json
is never used in training. It was an experimental and it makes no difference in this project.Hi, Can you give us some advice on how to make a new bpe_x.json from our own data.
Oh, after reading the code, I probably know how to prepare a new bpe, convert the training data into ipa format , and then use bpetokenizer to train to generate bpe.json, right? @Plachtaa
bpe_1024.json
is never used in training. It was an experimental and it makes no difference in this project.Hi, Can you give us some advice on how to make a new bpe_x.json from our own data.
Oh, after reading the code, I probably know how to prepare a new bpe, convert the training data into ipa format , and then use bpetokenizer to train to generate bpe.json, right? @Plachtaa
you are right