vcc20_baseline_cyclevae icon indicating copy to clipboard operation
vcc20_baseline_cyclevae copied to clipboard

.f0 and .pow features

Open SolomidHero opened this issue 4 years ago • 2 comments
trafficstars

Hi! I am training the model for my custom dataset and there is a problem with .f0 and .pow in egs/cyclevae/conf folder.

As I understand it right:

  • spk_name.f0 contains min and max value of f0 of spk_name speaker for which cutoff will be.
  • spk_name.pow has single value with power threshold (what does it mean?)

I don't quite understand, should I process these values by myself somehow, or is there a code in repo that can do it?

Thank you.

SolomidHero avatar Jan 25 '21 11:01 SolomidHero

I'm also not sure about this, but I did find a discussion about setting these here on slides 20 and 21:

https://www.slideshare.net/NU_I_TODALAB/hands-on-voice-conversion

bobbdunn avatar Jan 28 '21 16:01 bobbdunn

Hi Solomid and bobbdunn, sorry for the late reply.

"spk_name.f0" includes the upper and lower bounds of the spk_name's f0 for the feature extraction. "spk_name.pow" includes the power threshold in dB (default is -20 dB) for the VAD preprocessing in voice conversion.

As a result, you need to carefully set these values for each new speaker by yourself when adopting a new data set. We set these values based on the f0 and power distributions. The process of plotting f0 and power distributions can be found in the following repo. https://github.com/k2kobayashi/sprocket

For the f0 rage, we usually set the upper and lower bound according to the range of the main lobe of the f0 distribution. For the power threshold, there should be two peaks in the power distribution. One is for the speech frames and another one is for the silence frames. We usually set the middle point between these two peaks as the power threshold.

More details can be found in the sprocket-vc repo and the above slides.

bigpon avatar Jan 28 '21 17:01 bigpon