audio
audio copied to clipboard
Add xfolding to tacotron2 infer pipeline
In case of vocoding one example, by folding the input example into batch of chunks, the inference can run faster.
https://github.com/pytorch/audio/blob/31dbb7540c78fe5d176948764cf9a20f55ac80dc/examples/pipeline_wavernn/wavernn_inference_wrapper.py#L167-L177
I excluded it from the initial tacotron2 pipeline, due to the https://github.com/pytorch/audio/issues/1742 we can re-implement this while resolving why #1742 was the case.
https://github.com/pytorch/audio/blob/31dbb7540c78fe5d176948764cf9a20f55ac80dc/examples/pipeline_wavernn/wavernn_inference_wrapper.py#L32-L129
Is this tacotron2 related? Or is the method only for WaveRNN?
It's for wavernn but implemented in tts pipeline. In this class.
https://github.com/pytorch/audio/blob/56f3b92746022cad8bd20f23b7a92023fb5560cc/torchaudio/pipelines/_tts/impl.py#L71-L96