audio Hardcoded padding mode in functional.spectrogram

Hardcoded padding mode in functional.spectrogram

Open gormat opened this issue 2 years ago • 3 comments

I wonder why the padding mode is hardcoded in functional.spectrogram? Maybe add a parameter to support reflect padding?

https://github.com/pytorch/audio/blob/main/torchaudio/functional/functional.py#L117

Feb 06 '23 09:02 gormat

Hi @gormat

This is an old implementation and the context is lost, so we are not sure why it's implemented this way. From our understanding, adding extra padding like this does not make much sense (especially with the default argument where stft adds reflection padding)

What behavior do you think is ideal?

Feb 21 '23 20:02 mthrok

First, I think it is confusing to have two arguments for the function one called pad and another pad_mode which refer to entirely different things (pad stands for tow-sided padding of the signal and pad_mode is for stft). Next, migrating from different libraries to torchaudio causes numerical inconsistency due to fixed reflect padding, especially when processing short signals.

Feb 26 '23 07:02 gormat

@mthrok Maybe the reason why this was implemented this way is because torch.stft discards trailing samples if they do not fit a frame. This is specially noticeable when using center=False as pointed out in pytorch/pytorch#70073 and in #3003.

According to @nateanl in pytorch/pytorch#70073, this is "mitigated by defining the length argument in torch.istft, which pads zero tensors after the inverted signal". But to me it makes more sense to pad the input of torch.stft such that we have an integer number of frames (which might not need to be done on both sides), then analyze-synthesize, and only then trim if needed to match the original input length. It's less "lossy", and this might be why the two types of padding in torchaudio.functional.spectrogram.

There is still the question of whether the padding modes should match. Currently the first padding mode is hardcoded to "constant" as pointed out by OP.

Apr 04 '23 09:04 philgzl

audio audio copied to clipboard

Hardcoded padding mode in functional.spectrogram

audio
audio copied to clipboard