autovc
autovc copied to clipboard
What are the full preprocessing steps?
I'm retraining the model using my own data but my output is all noise. I'm suspecting that I'm having an issue with the way I'm generating the mel-spectrograms. I'm generating them using librosa and inverting the output of the model back to raw audio using librosa too.
Here are the functions I'm using to generate mel-spectrogram from raw audio:
def normalize(S):
return np.clip((S - hp.min_level_db) / -hp.min_level_db, 0, 1)
def denormalize(S):
return (np.clip(S, 0, 1) * -hp.min_level_db) + hp.min_level_db
def amp_to_db(x):
return 20 * np.log10(np.maximum(1e-5, x))
def db_to_amp(x):
return np.power(10.0, x * 0.05)
def melspectrogram(y):
S = librosa.feature.melspectrogram(y=y, sr=hp.sr, n_fft=hp.fft_size, hop_length=hp.hop_length, n_mels=hp.n_mels, fmin=hp.fmin, fmax=hp.fmax, power = hp.power)
S = amp_to_db(S)
S = normalize(S)
return S
def inverse_melspectrogram(M):
M = denormalize(M)
M = db_to_amp(M)
y = librosa.feature.inverse.mel_to_audio(M=M, sr=hp.sr, n_fft=hp.fft_size, hop_length=hp.hop_length, power =hp.power)
return y
Here are the hyperparameters I'm using:
sr=16000
n_mels=80
fmin=90
fmax=7600
fft_size=1024
hop_length =256
min_level_db=-100
ref_level_db=20
PAD_VALUE = -100000
BATCH_SIZE = 32
MAX_FRAMES = 1024
power = 1.0
Could you tell me if there is an issue with my preprocessing steps? If you need any more info, please ask.
Thanks
What does your input and output spectrogram look like?