AutoPST
AutoPST copied to clipboard
How to find mean and std of MFCC?
The mean and std I created are different from the values in mfcc_stats.pkl you provided.
Can you please check if I am doing something wrong?
I attached a simple code below.
thanks.
mfcc_list = list()
for path in tqdm(wav_path):
wav, sampling_rate = sf.read(path)
mfcc = librosa.feature.mfcc(y=wav, sr=sampling_rate, n_mfcc=80, n_fft=1024, hop_length=256) # [80, T]
mfcc_list.append(mfcc)
mfcc_list = np.concatenate(mfcc_list , axis=1) # [80, T]
mfcc_mean = mfcc_list.mean(axis=1) # [80]
mfcc_std = mfcc_list.std(axis=1) # [80]
dctmx = scipy.fftpack.dct(np.eye(80), type=2, axis=1, norm='ortho') # [80, 80]
with open('assets/mfcc_stats.pkl', 'wb') as f:
pickle.dump([mfcc_mean, mfcc_std, dctmx], f, pickle.HIGHEST_PROTOCOL)
This is normal. Because you computed mfcc in different ways.
Thanks for your reply.
Hello, can you please tell us what the correct way to generate mfcc_stats is?
@avanitanna Just compute the mean and std of the mfcc feature.
@auspicious3000 I understand. How should I go from wav files to computing mfcc features and their mean and std? Do you have a script that we can use? I would love to use your work and cite it but it is a little difficult to get the code to work with new training data. I would appreciate your help!
dctmx = scipy.fftpack.dct(np.eye(80), type=2, axis=1, norm='ortho')
# compute mfcc stats using all spectrograms
mfcc_all = sp_all.dot(dctmx)
mfcc_mean, mfcc_std = np.mean(mfcc_all,axis=0), np.std(mfcc_all,axis=0)
# normalize each mfcc
cc_tmp = sp_tmp.dot(dctmx)
cc_norm = (cc_tmp - mfcc_mean) / mfcc_std
@auspicious3000 thank you! how do you get sp_all and what is sp_tmp? Is it a concatenation of all spectograms? How do I create sp_all? Does the following make sense ?
Say I have multiple spectograms -
mfcc_list = []
for file_name in ['p225_003.npy', 'p225_008.npy, ...]:
f = np.load(file_name)
mfcc_list.append(f)
sp_all = np.concatenate(mfcc_list,axis=0)
mfcc_all = sp_all.dot(dctmx)
@avanitanna sp_all is the concatenation of all mel spectrogram, sp_tmp is the spectrogram you want to normalize