audio icon indicating copy to clipboard operation
audio copied to clipboard

Support custom padding in istft

Open yuzhms opened this issue 4 years ago • 6 comments

🚀 Feature

Current version of istft force check NOLA thus some kind of padding (e.g. asymmetrical padding or #427 ) will raise assertion error. I would like to see there is a way to support custom padding. maybe just give a choice to manually disable NOLA check.

yuzhms avatar Apr 03 '20 12:04 yuzhms

In comment, I suggested experimenting with disabling NOLA. You can return python without assertion checks. If you do so, do you get what you expect? Can you provide a test case for what you would like?

vincentqb avatar Apr 03 '20 15:04 vincentqb

In comment, I suggested experimenting with disabling NOLA. You can return python without assertion checks. If you do so, do you get what you expect? Can you provide a test case for what you would like?

I found it is more usful to get what I want by adding two additional parameters start and end and modifing this line as

start = half_n_fft if center else start
end = -half_n_fft if end is None else end

After this modification, I can get what I want.

Here is the test case:

import math
n_fft = 512
hop_length = 64
win_length = 128
signal_length = 48000
signal = torch.randn(1, signal_length)

pad_left = (n_fft - win_length) // 2
pad_right = int(math.ceil(signal_length / hop_length)) * hop_length - signal_length + pad_left

spectrogram = torch.stft(
    torch.nn.functional.pad(signal, [pad_left, pad_right]),
    n_fft=n_fft,
    hop_length=hop_length,
    win_length=win_length,
    window=torch.hann_window(win_length),
    center=False)
reconstructed_signal = istft(
    spectrogram,
    n_fft=n_fft,
    hop_length=hop_length,
    win_length=win_length,
    window=torch.hann_window(win_length),
    center=False,
    start=pad_left+1,
    end=signal_length+pad_left)
torch.testing.assert_allclose(signal[:, 1:], reconstructed_signal, atol=1e-4, rtol=1e-9)

yuzhms avatar Apr 05 '20 03:04 yuzhms

@mthrok -- do you want to look into this?

vincentqb avatar Apr 08 '20 14:04 vincentqb

I am trying to perfectly align an orignal signal with istft(stft(signal)) in pytorch. I am able to do it like this in librosa:

n_fft = 2048
win_length = n_fft
hop_length = 1024
sample_rate = 44100
overlap = win_length - hop_length

def end_pad(length):
    return hop_length - (length - overlap) % hop_length

y, _ = librosa.load(librosa.util.example_audio_file(), sr=sample_rate)
y = np.pad(y, (n_fft // 2, 0), mode='reflect')
y = np.pad(y, (0, end_pad(len(y))), mode='reflect')

stft = librosa.stft(
    y,
    n_fft=n_fft,
    hop_length=hop_length,
    win_length=win_length,
    window='hann',
    center=False
)

y_hat = librosa.istft(
    stft,
    hop_length=hop_length,
    win_length=win_length,
    window='hann',
    center=False
)

When I plot and compare random sections from y and y_hat this is the result in librosa: y and y_hat

If I pad my signal the same way and use pytorch's stft and istft functions I get this error:

RuntimeError: istft(torch.FloatTensor[1, 1025, 2647, 2], n_fft=2048, hop_length=1024, win_length=2048, window=torch.FloatTensor{[2048]}, center=0, normalized=0, onesided=1, length=None)window overlap add min: 0

I couldn't get y and y_hat to align without custom padding, no matter what arguments I gave to stft and istft.

I am guessing this is a related issue.

kureta avatar Jun 09 '20 13:06 kureta

Assuming pytorch will add n_fft // 2 padding at both ends, thus increase the signal length by n_fft, right padding the signal like this y = np.pad(y, (0, end_pad(len(y)+n_fft)), mode='reflect') and setting center=True solved the alignment problem.

kureta avatar Jun 09 '20 13:06 kureta

@kureta perhaps I'm a bit late but I provided a simpler (maybe?) workaround in pytorch/pytorch#62323

miccio-dk avatar Jan 06 '22 13:01 miccio-dk