DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[FastPitch1.1/pytorch] should `pitch_std` and `pitch_mean` be configs identical to speaker?

Open JohnHerry opened this issue 3 years ago • 0 comments

Related to FastPitch1.1/pytorch

Describe the bug

I have trained a multispeaker FastPitch, On the inference process, When I use either one of the --pitch-transform-flatten, --pitch-transform-invert, --pitch-transform-amplify to modify its predicted pitch, the result wave would sound strange.

To Reproduce

Train a mult-speaker checkpoint, and different speakers have different pitch span.

Expected behavior I guess this is because my multi-speaker instance have both man and women, they are quite different in pitch. So I think on the training process, the pitch-mean and pitch-std parameters should both be vectors with the same size of speakers. so that each speaker have its identical config. and when it comes to the inference process, parameters of (speakerId, pitch_std, pitch_mean) shoud be a grouped params for a specified speaker.

Environment Please provide at least:

  • Container version (e.g. pytorch:19.05-py3):
  • GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB):
  • CUDA driver version (e.g. 418.67):

JohnHerry avatar Jun 01 '22 13:06 JohnHerry