autovc
autovc copied to clipboard
Hyperparameters for generating mel spectrogram from training .wav files
Could you please tell us how you generated mel spectrograms for training from .wav files? What were the parameters used?
#4
y, sr = librosa.load('p225_001.wav', sr=16000)
S = librosa.feature.melspectrogram(y, sr=16000, n_mels=80, fmin=90, fmax=7600, n_fft=1024, hop_length=256)
S_r0 = 20 * np.log10(np.maximum(1e-5, S))
S_r0 = S_r0 - 16
S_r0 = np.clip((S_r0 + 100.0) / 100.0, 0, 1)
print(np.min(S_r0),np.max(S_r0), S_r0.shape)
waveform = wavegen(model, c=S_r0.T)
librosa.output.write_wav('test_r0.wav', waveform, sr=16000)
I am using the above code to generate mel spectrogram of the file p225_001.wav. Here, I have used the following parameters: num_mels: 80 fmin: 90 fmax: 7600 fft_size: 1024 hop_size: 256 min_level_db: -100 ref_level_db: 16 But the generated spectrogram is not same as the one provided in metadata.pkl. Also I tried passing both the spectrograms through the wavenet vocoder model provided but the audio generated for my spectrogram is inferior in quality as compared to the audio generated by using the spectrogram in metadata.pkl
#4 see the last few comments