vcc20_baseline_cyclevae
vcc20_baseline_cyclevae copied to clipboard
.f0 and .pow features
Hi! I am training the model for my custom dataset and there is a problem with .f0 and .pow in egs/cyclevae/conf folder.
As I understand it right:
spk_name.f0contains min and max value of f0 ofspk_namespeaker for which cutoff will be.spk_name.powhas single value with power threshold (what does it mean?)
I don't quite understand, should I process these values by myself somehow, or is there a code in repo that can do it?
Thank you.
I'm also not sure about this, but I did find a discussion about setting these here on slides 20 and 21:
https://www.slideshare.net/NU_I_TODALAB/hands-on-voice-conversion
Hi Solomid and bobbdunn, sorry for the late reply.
"spk_name.f0" includes the upper and lower bounds of the spk_name's f0 for the feature extraction. "spk_name.pow" includes the power threshold in dB (default is -20 dB) for the VAD preprocessing in voice conversion.
As a result, you need to carefully set these values for each new speaker by yourself when adopting a new data set. We set these values based on the f0 and power distributions. The process of plotting f0 and power distributions can be found in the following repo. https://github.com/k2kobayashi/sprocket
For the f0 rage, we usually set the upper and lower bound according to the range of the main lobe of the f0 distribution. For the power threshold, there should be two peaks in the power distribution. One is for the speech frames and another one is for the silence frames. We usually set the middle point between these two peaks as the power threshold.
More details can be found in the sprocket-vc repo and the above slides.