DeepLearningExamples
DeepLearningExamples copied to clipboard
[FastPitch1.1/pytorch] should `pitch_std` and `pitch_mean` be configs identical to speaker?
Related to FastPitch1.1/pytorch
Describe the bug
I have trained a multispeaker FastPitch, On the inference process, When I use either one of the --pitch-transform-flatten, --pitch-transform-invert, --pitch-transform-amplify to modify its predicted pitch, the result wave would sound strange.
To Reproduce
Train a mult-speaker checkpoint, and different speakers have different pitch span.
Expected behavior
I guess this is because my multi-speaker instance have both man and women, they are quite different in pitch. So I think on the training process, the pitch-mean and pitch-std parameters should both be vectors with the same size of speakers. so that each speaker have its identical config. and when it comes to the inference process, parameters of (speakerId, pitch_std, pitch_mean) shoud be a grouped params for a specified speaker.
Environment Please provide at least:
- Container version (e.g. pytorch:19.05-py3):
- GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB):
- CUDA driver version (e.g. 418.67):