autovc icon indicating copy to clipboard operation
autovc copied to clipboard

Hyperparameters for generating mel spectrogram from training .wav files

Open sroutray opened this issue 6 years ago • 3 comments

Could you please tell us how you generated mel spectrograms for training from .wav files? What were the parameters used?

sroutray avatar Aug 27 '19 17:08 sroutray

#4

auspicious3000 avatar Aug 28 '19 16:08 auspicious3000

y, sr = librosa.load('p225_001.wav', sr=16000)
S = librosa.feature.melspectrogram(y, sr=16000, n_mels=80, fmin=90, fmax=7600, n_fft=1024, hop_length=256)
S_r0 = 20 * np.log10(np.maximum(1e-5, S))
S_r0 = S_r0 - 16
S_r0 = np.clip((S_r0 + 100.0) / 100.0, 0, 1)
print(np.min(S_r0),np.max(S_r0), S_r0.shape)

waveform = wavegen(model, c=S_r0.T)   
librosa.output.write_wav('test_r0.wav', waveform, sr=16000)

I am using the above code to generate mel spectrogram of the file p225_001.wav. Here, I have used the following parameters: num_mels: 80 fmin: 90 fmax: 7600 fft_size: 1024 hop_size: 256 min_level_db: -100 ref_level_db: 16 But the generated spectrogram is not same as the one provided in metadata.pkl. Also I tried passing both the spectrograms through the wavenet vocoder model provided but the audio generated for my spectrogram is inferior in quality as compared to the audio generated by using the spectrogram in metadata.pkl

sroutray avatar Aug 28 '19 18:08 sroutray

#4 see the last few comments

auspicious3000 avatar Aug 29 '19 11:08 auspicious3000