FastSpeech2
FastSpeech2 copied to clipboard
The max_wav_value in preprocess.yaml should be smaller than 32768.0
If the max_wav_value is 32868, when you run wav = wav / max(abs(wav)) * max_wav_value
in preprocess_align.py, it may cause data overflow and bring instant noise in the training wav, which will also finally bring instant noise in the generated wav. . So the max_wav_value should be a little bit smaller than 32768.
mark
I cannot understood why we should do this "wav = wav / max(abs(wav)) * max_wav_value" ? Can you share the reason?
try this
max_wav_value: 32767.5
wav = wav / max(abs(wav)) * max_wav_value - 0.5 # data range -32768~32767
which should fit in np.int16 range.
I cannot understood why we should do this "wav = wav / max(abs(wav)) * max_wav_value" ? Can you share the reason?
Do you understand its function now? In the preprocessor. py file, use wav, = The wav value read from librosa. load (wav_path) is still between -1 and 1. Why do you still use wav=wav/max (abs (wav)) * max wav_ value enlarges the wav value to - max_ wav_ value to max_ wav_ values?