Kaizhi Qian

Results 196 comments of Kaizhi Qian

You could just skip the shorter utterances.

The output of the f0 predictor is 257 dim logit instead of one-hot. So, you need to use cross-entropy loss as indicated in the paper.

The target is the quantized the ground truth f0, based on https://arxiv.org/abs/2004.07370

The posted solution is to use the vocoder from AutoVC. They share the same vocoder and thus not included in this repo.

First of all, you need to install the appropriate version of r9y9's Wavenet vocoder, which is a large and delicate repo by itself. We did not include it in our...

no, the pretrained model only works for speakers in the training set

@leijue222 yes you can, but you need to re-train the model.