audio Moving WaveRNN padding into the model

Moving WaveRNN padding into the model

Open mthrok opened this issue 4 years ago • 0 comments

Currently WaveRNN's forward method expects client code to to pad the input spectrogram to the specific size (kernel_size - 1 // 2). This breaks the encapsulation. The WaveRNN's forward method should perform padding itself. The newly added infer method does this.

General direction

In forward, add padding before upsample. https://github.com/pytorch/audio/blob/483d8fae63f0102a31e9842a593f462399116fbd/torchaudio/models/wavernn.py#L319-L322 as done in infer https://github.com/pytorch/audio/blob/483d8fae63f0102a31e9842a593f462399116fbd/torchaudio/models/wavernn.py#L381
Update the collate function in the training script so that it does not perform padding https://github.com/pytorch/audio/blob/a6f9cf8babfb096381e914e23950371478672b3e/examples/pipeline_wavernn/datasets.py#L76
Update tests and surrounding comments about the shape.

Oct 13 '21 13:10 mthrok

audio audio copied to clipboard

Moving WaveRNN padding into the model

General direction

audio
audio copied to clipboard