espnet_onnx icon indicating copy to clipboard operation
espnet_onnx copied to clipboard

wav quality drop

Open 1nlplearner opened this issue 2 years ago • 2 comments

hi, I initial text2speech using my own am_model and vocoder and export onnx model, but sound quality drops significantly, I just modify hifigan inference code in https://github.com/Masao-Someki/espnet_onnx/blob/feature/add_PWGVocoder/espnet_onnx/export/tts/models/vocoders/parallel_wavegan.py because hifigan code in repo ParalleWaveGAN does not support parameter x, and i checked Espnet am and vocoder and onnx am and vocoder, they look the same could you please offer some advises?

1nlplearner avatar Nov 28 '22 08:11 1nlplearner

hi, I initial text2speech using my own am_model and vocoder and export onnx model, but sound quality drops significantly, I just modify hifigan inference code in https://github.com/Masao-Someki/espnet_onnx/blob/feature/add_PWGVocoder/espnet_onnx/export/tts/models/vocoders/parallel_wavegan.py because hifigan code in repo ParalleWaveGAN does not support parameter x, and i checked Espnet am and vocoder and onnx am and vocoder, they look the same could you please offer some advises?

when i delete postprocess code in https://github.com/Masao-Someki/espnet_onnx/blob/master/espnet_onnx/tts/tts_model.py ,model can synthesis voice as pytorch inferencing

1nlplearner avatar Nov 29 '22 11:11 1nlplearner

@1nlplearner Thank you for reporting this issue.

when i delete postprocess code in https://github.com/Masao-Someki/espnet_onnx/blob/master/espnet_onnx/tts/tts_model.py ,model can synthesis voice as pytorch inferencing

It seems that the normalization process causes this issue. Would you check your config file in ~/.cache/espnet_onnx/<tag_name>/config.yml, and check if the use_normalize is set to False? I think setting the use_normalize: false will fix this problem.

Masao-Someki avatar Dec 01 '22 11:12 Masao-Someki