Aaron (Yinghao) Li

Results 110 comments of Aaron (Yinghao) Li
trafficstars

@yihuitang In my case, I separated between words because I used the PL-BERT trained jointly with Chinese, Japanese, and English and word boundaries were used when pre-training the PL-BERT, but...

@yihuitang `n_prods` should be the number of tones (e.g., for Mandarin Chinese it should be 5, for Japanese it should be 2, for Cantonese it should be 6). The tones...

@JohnHerry I'm a little confused what #2 "prosody segment" is? I guess the issue (#2 ) only involves multilingual support for phonemization, not sure why it is related to prosody...

Sorry for the late reply. I was pretty busy recently. This is likely due to some miscongfiguration [here](https://github.com/yl4579/PitchExtractor/blob/main/Configs/config.yml#L16). The proportion between F0 and silence loss is not well balanced so...

Have you checked whether `F0_fake`, `N_fake`, `s` or `en` are all not `NaN`?

The following is the broken (and unfinished) code for `train_second.py` with DDP: ```python # load packages import random import yaml import time from munch import Munch import numpy as np...

@zhouyong64 This issue (in-place operation) was first identified by @ABC0408, who used a different PyTorch version than me, though we both used the PyTorch > 2.0. Not sure if it...

Hi @stevenhillis , thanks for your help. The problem happens even before the discriminator kicks in, so it is unlikely caused by `spectral_norm`. I have tried your suggestions and luckily...

@lawlietlight Thanks for your willingness to help. Maybe you can debug this problem if you have time?

@hermanseu I think separating F0 and duration is probably fine but you also need to sample more dimensions in diffusion model. Did you notice any performance drop by doing these?