VQMIVC icon indicating copy to clipboard operation
VQMIVC copied to clipboard

lf0 question about convert phase

Open powei-C opened this issue 3 years ago • 3 comments
trafficstars

Hi, I wonder why you normalize f0 series before feeding to the f0encoder in convert.py. However, this kind of normalization for f0 isn't used in preprocessing phase.

powei-C avatar Jul 12 '22 03:07 powei-C

Hi, normalizing f0 aims to remove the speaker characteristics. During preprocessing phase, f0 is not normalized, but during training and inference, f0 is normalized as shown below: https://github.com/Wendison/VQMIVC/blob/851b4f5ca5bb60c11fea6a618affeb4979b17cf3/dataset.py#L53 https://github.com/Wendison/VQMIVC/blob/851b4f5ca5bb60c11fea6a618affeb4979b17cf3/convert_example.py#L57

Wendison avatar Jul 13 '22 05:07 Wendison

Hi, thank you for your explanation!!! I have another question about perplexity when training the model with another dataset. I found that the perplexity didn't keep increasing (have run around 360 epochs in the figure), was it reasonable? And do you have any suggestions to verify this issue? image

powei-C avatar Jul 13 '22 15:07 powei-C

The perplexity should be increasing during training, as higer perplexity indicates that the vectors in the VQ codebook are distinguiable and can be used to represent different acoustic units. I also saw your recon_loss is high. Based on my experience, recon_loss should be less than 0.5, then you would obtain good converted samples.

Wendison avatar Jul 22 '22 08:07 Wendison