AutoPST icon indicating copy to clipboard operation
AutoPST copied to clipboard

How to find mean and std of MFCC?

Open insunhwang89 opened this issue 4 years ago • 8 comments

The mean and std I created are different from the values in mfcc_stats.pkl you provided.

Can you please check if I am doing something wrong?

I attached a simple code below.

thanks.


mfcc_list = list()
for path in tqdm(wav_path):
        wav, sampling_rate = sf.read(path) 
        mfcc = librosa.feature.mfcc(y=wav, sr=sampling_rate, n_mfcc=80, n_fft=1024, hop_length=256) # [80, T] 
        mfcc_list.append(mfcc)

mfcc_list = np.concatenate(mfcc_list , axis=1) # [80, T]
mfcc_mean = mfcc_list.mean(axis=1) # [80]
mfcc_std = mfcc_list.std(axis=1) # [80]

dctmx = scipy.fftpack.dct(np.eye(80), type=2, axis=1, norm='ortho') # [80, 80] 

with open('assets/mfcc_stats.pkl', 'wb') as f:
        pickle.dump([mfcc_mean, mfcc_std, dctmx], f, pickle.HIGHEST_PROTOCOL)

insunhwang89 avatar Sep 10 '21 00:09 insunhwang89

This is normal. Because you computed mfcc in different ways.

auspicious3000 avatar Sep 10 '21 00:09 auspicious3000

Thanks for your reply.

insunhwang89 avatar Sep 10 '21 00:09 insunhwang89

Hello, can you please tell us what the correct way to generate mfcc_stats is?

avanitanna avatar Oct 15 '22 21:10 avanitanna

@avanitanna Just compute the mean and std of the mfcc feature.

auspicious3000 avatar Oct 16 '22 01:10 auspicious3000

@auspicious3000 I understand. How should I go from wav files to computing mfcc features and their mean and std? Do you have a script that we can use? I would love to use your work and cite it but it is a little difficult to get the code to work with new training data. I would appreciate your help!

avanitanna avatar Oct 16 '22 03:10 avanitanna

dctmx = scipy.fftpack.dct(np.eye(80), type=2, axis=1, norm='ortho')

# compute mfcc stats using all spectrograms
mfcc_all = sp_all.dot(dctmx)
mfcc_mean, mfcc_std = np.mean(mfcc_all,axis=0), np.std(mfcc_all,axis=0)

# normalize each mfcc
cc_tmp = sp_tmp.dot(dctmx)
cc_norm = (cc_tmp - mfcc_mean) / mfcc_std

auspicious3000 avatar Oct 16 '22 04:10 auspicious3000

@auspicious3000 thank you! how do you get sp_all and what is sp_tmp? Is it a concatenation of all spectograms? How do I create sp_all? Does the following make sense ?

Say I have multiple spectograms -

mfcc_list = []
for file_name in ['p225_003.npy', 'p225_008.npy, ...]:
    f = np.load(file_name)
    mfcc_list.append(f)

sp_all = np.concatenate(mfcc_list,axis=0)
mfcc_all = sp_all.dot(dctmx) 

avanitanna avatar Oct 16 '22 19:10 avanitanna

@avanitanna sp_all is the concatenation of all mel spectrogram, sp_tmp is the spectrogram you want to normalize

auspicious3000 avatar Oct 16 '22 21:10 auspicious3000