Aaron (Yinghao) Li comments

Results 110 comments of


                                            Aaron (Yinghao) Li

trafficstars

mandrain support?

@yihuitang In my case, I separated between words because I used the PL-BERT trained jointly with Chinese, Japanese, and English and word boundaries were used when pre-training the PL-BERT, but...

mandrain support?

@yihuitang `n_prods` should be the number of tones (e.g., for Mandarin Chinese it should be 5, for Japanese it should be 2, for Cantonese it should be 6). The tones...

mandrain support?

@JohnHerry I'm a little confused what #2 "prosody segment" is? I guess the issue (#2 ) only involves multilingual support for phonemization, not sure why it is related to prosody...

Sorry for the late reply. I was pretty busy recently. This is likely due to some miscongfiguration [here](https://github.com/yl4579/PitchExtractor/blob/main/Configs/config.yml#L16). The proportion between F0 and silence loss is not well balanced so...

train_second.py model.decoder error (output tensor is nan)

Have you checked whether `F0_fake`, `N_fake`, `s` or `en` are all not `NaN`?

Extremely weird DDP issue for train_second.py

The following is the broken (and unfinished) code for `train_second.py` with DDP: ```python # load packages import random import yaml import time from munch import Munch import numpy as np...

Extremely weird DDP issue for train_second.py

@zhouyong64 This issue (in-place operation) was first identified by @ABC0408, who used a different PyTorch version than me, though we both used the PyTorch > 2.0. Not sure if it...

Extremely weird DDP issue for train_second.py

Hi @stevenhillis , thanks for your help. The problem happens even before the discriminator kicks in, so it is unlikely caused by `spectral_norm`. I have tried your suggestions and luckily...

Extremely weird DDP issue for train_second.py

@lawlietlight Thanks for your willingness to help. Maybe you can debug this problem if you have time?

Extremely weird DDP issue for train_second.py

@hermanseu I think separating F0 and duration is probably fine but you also need to sample more dimensions in diffusion model. Did you notice any performance drop by doing these?