PyTSMod
PyTSMod copied to clipboard
[BUG] tdpsola does not work properly for low beta values
Describe the bug When changing pitch of a voice with tsm.tdpsola by low beta factor, pitch stays the same and you can hear clicking artifacts. There are pitch shifting plugins that use TD-PSOLA and allow for even lower pitch changes, so I don't think this is a limitation of the algorithm.
To Reproduce Code to reproduce the behavior:
import numpy as np
import librosa
import soundfile as sf
import matplotlib.pyplot as plt
import pytsmod as tsm
n_fft = 1024
hop_length_factor = 4
file_path = "audio.flac"
print("Loading audio file...")
audio, sr = librosa.load(file_path, sr=None, mono=True)
print(sr)
hop_length = n_fft // hop_length_factor
print("pyin")
f0, _, _= librosa.pyin(
audio,
sr=sr,
fmin=librosa.note_to_hz("C2"),
fmax=librosa.note_to_hz("C7"),
frame_length=n_fft,
hop_length=hop_length,
)
mask = np.isnan(f0)
# linearly interpolate pitch in place of nans
f0[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), f0[~mask])
audio_stft = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length)
f0_stft = f0_stft = f0 * n_fft/sr
# plot spectrogram and f0
spect = librosa.amplitude_to_db(np.abs(audio_stft), ref=np.max)
fig, ax = plt.subplots()
img = librosa.display.specshow(spect, x_axis="time", ax=ax, sr=sr, hop_length=hop_length)
fig.colorbar(img, ax=ax, format="%2.f")
ax.plot(librosa.times_like(f0_stft, sr=sr, hop_length=hop_length), f0_stft, label="f0", color="cyan")
plt.show()
audio = tsm.tdpsola(audio, sr, f0, beta=0.5, p_hop_size=hop_length, p_win_size=n_fft)
sf.write("tdpsola test.wav", audio, sr)
Desktop:
- OS: Windows 11
- Python version: Python 3.12.4
- PyTSMod version: 0.3.8
Thank you for your report. PSOLA in pytsmod is re-implementation of the MATLAB implementation in the DAFX Digital Audio Effects book.
It isn't easy to improve the algorithm without any references, so if you have any cases with the code, please share them with me.
I cannot promise that it can be fixed in the near future, but I will check it.
I think there is exactly the sample problem as with this implementation: https://dsp.stackexchange.com/questions/61687/problem-using-pitch-shifting-with-td-psola-and-formant-preservation PSOLA should not try to fill gaps for low pitches. It should leave blank spaces. It sounds very unintuitive, but it's just a limitation of the algorithm itself. I don't have any reference code because I haven't found any correct implementation yet lol