VQMIVC lf0 question about convert phase

lf0 question about convert phase

Open powei-C opened this issue 3 years ago • 3 comments

trafficstars

Hi, I wonder why you normalize f0 series before feeding to the f0encoder in convert.py. However, this kind of normalization for f0 isn't used in preprocessing phase.

Jul 12 '22 03:07 powei-C

Hi, normalizing f0 aims to remove the speaker characteristics. During preprocessing phase, f0 is not normalized, but during training and inference, f0 is normalized as shown below: https://github.com/Wendison/VQMIVC/blob/851b4f5ca5bb60c11fea6a618affeb4979b17cf3/dataset.py#L53 https://github.com/Wendison/VQMIVC/blob/851b4f5ca5bb60c11fea6a618affeb4979b17cf3/convert_example.py#L57

Jul 13 '22 05:07 Wendison

Hi, thank you for your explanation!!! I have another question about perplexity when training the model with another dataset. I found that the perplexity didn't keep increasing (have run around 360 epochs in the figure), was it reasonable? And do you have any suggestions to verify this issue?

Jul 13 '22 15:07 powei-C

The perplexity should be increasing during training, as higer perplexity indicates that the vectors in the VQ codebook are distinguiable and can be used to represent different acoustic units. I also saw your recon_loss is high. Based on my experience, recon_loss should be less than 0.5, then you would obtain good converted samples.

Jul 22 '22 08:07 Wendison

VQMIVC VQMIVC copied to clipboard

lf0 question about convert phase

VQMIVC
VQMIVC copied to clipboard