audio
audio copied to clipboard
Moving WaveRNN padding into the model
Currently WaveRNN's forward method expects client code to to pad the input spectrogram to the specific size (kernel_size - 1 // 2). This breaks the encapsulation. The WaveRNN's forward method should perform padding itself. The newly added infer method does this.
General direction
- In
forward, add padding before upsample. https://github.com/pytorch/audio/blob/483d8fae63f0102a31e9842a593f462399116fbd/torchaudio/models/wavernn.py#L319-L322 as done ininferhttps://github.com/pytorch/audio/blob/483d8fae63f0102a31e9842a593f462399116fbd/torchaudio/models/wavernn.py#L381 - Update the collate function in the training script so that it does not perform padding https://github.com/pytorch/audio/blob/a6f9cf8babfb096381e914e23950371478672b3e/examples/pipeline_wavernn/datasets.py#L76
- Update tests and surrounding comments about the shape.