PL-BERT icon indicating copy to clipboard operation
PL-BERT copied to clipboard

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Results 8 PL-BERT issues
Sort by recently updated
recently updated
newest added

I'm having trouble running the [preprocess jupyter notebook](https://github.com/yl4579/PL-BERT/blob/main/preprocess.ipynb) you provided. I was trying to create PL-BERT for Slovak language but even when I try to run the code you provided,...

I try training PLBert for vietnamese (using multilingual Bert based model with wiki-vi dataset), but Vocab loss is 0.0 (begin first step), is it okay? @yl4579 Step [19920/1000000], Loss: 0.33009,...

TL;DR: * Encountering frequent NaN values mainly for the Loss, during training with [a large JPN dataset ](https://huggingface.co/datasets/oshizo/japanese-wikipedia-paragraphs)(10.5 million rows). * No such issues with another, albeit [smaller dataset ](https://huggingface.co/datasets/range3/wiki40b-ja)(800,000...

fix for https://github.com/yl4579/PL-BERT/issues/29 and support for malaylam ``` text = 'hello (1200 - 1230)' out = normalize_text(text) print(out) hello (one thousand two hundred to one thousand two hundred thirty) ```

I saw issues about this error. #28 But, I don't know how to solve this error.. I don't know how to write a code that skips the error. Can you...

https://github.com/yl4579/PL-BERT/blob/592293aabcb21096eb7f5bffad95a3d38ba4ae6c/dataloader.py#L83 Hi, why the masked_index is extended for 15% of tokens? If I understand correctly, the extention should be placed inside the else statement at line # 80, right?

Do you have any suggestions for Chinese data preprocessing? For example, text normalization, g2p, etc. From your experience, will the accuracy of the g2p model have great impact on the...