VALL-E-X Question about text tokenizer

Question about text tokenizer

Open BakerBunker opened this issue 1 year ago • 4 comments

I have noticed that there are two tokenizer dicts in utils/g2p , bpe_1024 and bpe_69, which is more suitable for generation task in your actual training practice? thank you.

Sep 08 '23 08:09 BakerBunker

bpe_1024.json is never used in training. It was an experimental and it makes no difference in this project.

Sep 12 '23 08:09 Plachtaa

bpe_1024.json is never used in training. It was an experimental and it makes no difference in this project.

Hi, Can you give us some advice on how to make a new bpe_x.json from our own data.

Jul 28 '24 03:07 yangyyt

bpe_1024.json is never used in training. It was an experimental and it makes no difference in this project.

Hi, Can you give us some advice on how to make a new bpe_x.json from our own data.

Oh, after reading the code, I probably know how to prepare a new bpe, convert the training data into ipa format , and then use bpetokenizer to train to generate bpe.json, right? @Plachtaa

Jul 28 '24 05:07 yangyyt

bpe_1024.json is never used in training. It was an experimental and it makes no difference in this project.

Hi, Can you give us some advice on how to make a new bpe_x.json from our own data.

Oh, after reading the code, I probably know how to prepare a new bpe, convert the training data into ipa format , and then use bpetokenizer to train to generate bpe.json, right? @Plachtaa

you are right

Jul 29 '24 15:07 Plachtaa

VALL-E-X VALL-E-X copied to clipboard

Question about text tokenizer

VALL-E-X
VALL-E-X copied to clipboard