FastSpeech2 icon indicating copy to clipboard operation
FastSpeech2 copied to clipboard

The max_wav_value in preprocess.yaml should be smaller than 32768.0

Open Georgehappy1 opened this issue 3 years ago • 4 comments

If the max_wav_value is 32868, when you run wav = wav / max(abs(wav)) * max_wav_value in preprocess_align.py, it may cause data overflow and bring instant noise in the training wav, which will also finally bring instant noise in the generated wav. . So the max_wav_value should be a little bit smaller than 32768.

Georgehappy1 avatar Dec 30 '21 07:12 Georgehappy1

mark

leslie2046 avatar Apr 16 '22 14:04 leslie2046

I cannot understood why we should do this "wav = wav / max(abs(wav)) * max_wav_value" ? Can you share the reason?

yangdongchao avatar Jul 12 '22 08:07 yangdongchao

try this max_wav_value: 32767.5 wav = wav / max(abs(wav)) * max_wav_value - 0.5 # data range -32768~32767 which should fit in np.int16 range.

vincentwu0730 avatar Sep 02 '22 07:09 vincentwu0730

I cannot understood why we should do this "wav = wav / max(abs(wav)) * max_wav_value" ? Can you share the reason?

Do you understand its function now? In the preprocessor. py file, use wav, = The wav value read from librosa. load (wav_path) is still between -1 and 1. Why do you still use wav=wav/max (abs (wav)) * max wav_ value enlarges the wav value to - max_ wav_ value to max_ wav_ values?

zhoufqing avatar Oct 11 '23 11:10 zhoufqing