audio
audio copied to clipboard
Support custom padding in istft
🚀 Feature
Current version of istft force check NOLA thus some kind of padding (e.g. asymmetrical padding or #427 ) will raise assertion error. I would like to see there is a way to support custom padding. maybe just give a choice to manually disable NOLA check.
In comment, I suggested experimenting with disabling NOLA. You can return python without assertion checks. If you do so, do you get what you expect? Can you provide a test case for what you would like?
In comment, I suggested experimenting with disabling NOLA. You can return python without assertion checks. If you do so, do you get what you expect? Can you provide a test case for what you would like?
I found it is more usful to get what I want by adding two additional parameters start and end and modifing this line as
start = half_n_fft if center else start
end = -half_n_fft if end is None else end
After this modification, I can get what I want.
Here is the test case:
import math
n_fft = 512
hop_length = 64
win_length = 128
signal_length = 48000
signal = torch.randn(1, signal_length)
pad_left = (n_fft - win_length) // 2
pad_right = int(math.ceil(signal_length / hop_length)) * hop_length - signal_length + pad_left
spectrogram = torch.stft(
torch.nn.functional.pad(signal, [pad_left, pad_right]),
n_fft=n_fft,
hop_length=hop_length,
win_length=win_length,
window=torch.hann_window(win_length),
center=False)
reconstructed_signal = istft(
spectrogram,
n_fft=n_fft,
hop_length=hop_length,
win_length=win_length,
window=torch.hann_window(win_length),
center=False,
start=pad_left+1,
end=signal_length+pad_left)
torch.testing.assert_allclose(signal[:, 1:], reconstructed_signal, atol=1e-4, rtol=1e-9)
@mthrok -- do you want to look into this?
I am trying to perfectly align an orignal signal with istft(stft(signal)) in pytorch. I am able to do it like this in librosa:
n_fft = 2048
win_length = n_fft
hop_length = 1024
sample_rate = 44100
overlap = win_length - hop_length
def end_pad(length):
return hop_length - (length - overlap) % hop_length
y, _ = librosa.load(librosa.util.example_audio_file(), sr=sample_rate)
y = np.pad(y, (n_fft // 2, 0), mode='reflect')
y = np.pad(y, (0, end_pad(len(y))), mode='reflect')
stft = librosa.stft(
y,
n_fft=n_fft,
hop_length=hop_length,
win_length=win_length,
window='hann',
center=False
)
y_hat = librosa.istft(
stft,
hop_length=hop_length,
win_length=win_length,
window='hann',
center=False
)
When I plot and compare random sections from y and y_hat this is the result in librosa:
y and y_hat
If I pad my signal the same way and use pytorch's stft and istft functions I get this error:
RuntimeError: istft(torch.FloatTensor[1, 1025, 2647, 2], n_fft=2048, hop_length=1024, win_length=2048, window=torch.FloatTensor{[2048]}, center=0, normalized=0, onesided=1, length=None)window overlap add min: 0
I couldn't get y and y_hat to align without custom padding, no matter what arguments I gave to stft and istft.
I am guessing this is a related issue.
Assuming pytorch will add n_fft // 2 padding at both ends, thus increase the signal length by n_fft, right padding the signal like this y = np.pad(y, (0, end_pad(len(y)+n_fft)), mode='reflect') and setting center=True solved the alignment problem.
@kureta perhaps I'm a bit late but I provided a simpler (maybe?) workaround in pytorch/pytorch#62323