Kaizhi Qian comments

Results 196 comments of


                                            Kaizhi Qian

len_crop issue when train with VCTK dataset

You could just skip the shorter utterances.

F0 Converter for P - loss function values

The output of the f0 predictor is 257 dim logit instead of one-hot. So, you need to use cross-entropy loss as indicated in the paper.

F0 Converter for P - loss function values

The target is the quantized the ground truth f0, based on https://arxiv.org/abs/2004.07370

AttributeError: 'HParams' object has no attribute 'builder'

The posted solution is to use the vocoder from AutoVC. They share the same vocoder and thus not included in this repo.

AttributeError: 'HParams' object has no attribute 'builder'

First of all, you need to install the appropriate version of r9y9's Wavenet vocoder, which is a large and delicate repo by itself. We did not include it in our...

"The provided training data is very small for code verification purpose only"

no, the pretrained model only works for speakers in the training set

"The provided training data is very small for code verification purpose only"

@leijue222 yes you can, but you need to re-train the model.

Does it will work on unseen data also? will it be able to convert voice of unseen speaker with different content than that of data in training, will we obtain the disentanglement?

You can make it generalize to unseen speakers by training it the same way as AutoVC.

Does it will work on unseen data also? will it be able to convert voice of unseen speaker with different content than that of data in training, will we obtain the disentanglement?

@skol101 it means training with generalized speaker embeddings instead of one-hot embeddings