RosaKit icon indicating copy to clipboard operation
RosaKit copied to clipboard

MFCC Function Testing

Open rahul140490 opened this issue 3 years ago • 16 comments

Hi, I want to extract 13 MFCC values from an audio file and I am using the newly added mfcc function like this - mfcc(nMFCC: 13, nFFT: 2048, hopLength: 512, sampleRate: 22050, melsCount: 128)

But the result of this function is a huge multi-array containing numerous double values for each chunk. As per my understanding, the result should be a linear array of 13 values for each chunk. Please correct me if I am wrong and please suggest how to get it working properly.

Also, I used this function to test in SpectrogramViewController :-

private func loadData() {
    spectrograms = [[Double]]()
    let url = Bundle.main.url(forResource: "test", withExtension: "wav")
    let soundFile = url.flatMap { try? WavFileManager().readWavFile(at: $0) }
    
    let dataCount = soundFile?.data.count ?? 0
    let sampleRate = soundFile?.sampleRate ?? 44100
    let bytesPerSample = soundFile?.bytesPerSample ?? 0

    let chunkSize = 66000
    let chunksCount = dataCount/(chunkSize*bytesPerSample) - 1

    let rawData = soundFile?.data.int16Array
    
    for index in 0..<chunksCount-1 {
        let samples = Array(rawData?[chunkSize*index..<chunkSize*(index+1)] ?? []).map { Double($0)/32768.0 }            
        let powerSpectrogram = samples.melspectrogram(nFFT: 1024, hopLength: 512, sampleRate: Int(sampleRate), melsCount: 128).map { $0.normalizeAudioPower() }
        spectrograms.append(contentsOf: powerSpectrogram.transposed)
        let mfccData = samples.mfcc(nMFCC: 13)
        print("mfcc - \(mfccData)")
    }

rahul140490 avatar Sep 17 '21 06:09 rahul140490

@rahul140490 , even in librosa: https://librosa.org/doc/main/generated/librosa.feature.mfcc.html

result is 2d matrix

dhrebeniuk avatar Sep 17 '21 08:09 dhrebeniuk

Oh yeah that seems right. But for testing, would it be possible to compare the results of librosa and rosakit for the same audio file and configurational values?

rahul140490 avatar Sep 17 '21 09:09 rahul140490

@rahul140490 , it's good question, I done just simple tests. As for my experience there might be problems. Because I quickly ported to iOS dct function from scipy, it's implemented in C++ and used bridges to python.

I tried resolve and remove this dependencies. But there might be some problems. Because C++ types casting working different. (It's addition pain)

dhrebeniuk avatar Sep 17 '21 09:09 dhrebeniuk

Got it. Thanks for the explanation. Could you please help me with a problem I am facing with this, I am trying to get single 13 MFCC values for an audio file. Meaning, the complete .wav file should be processed in one go, not chunk wise or frame wise. In simpler terms, MFCC for an audio file, not per chunk or per frame like we do in above loadData() function.

rahul140490 avatar Sep 17 '21 14:09 rahul140490

I just compared the MFCCs values extracted from the same audio file using Rosakit and libRosa and are not the same values. What makes me wonder is the difference in orders of magnitude:

Librosa [[-5.56532669e+01 -8.21184998e+01 -5.00438271e+01 -3.83153648e+01 -2.21641731e+01 -3.63747215e+01 -4.03212852e+01 -6.56709290e+01 -9.50198364e+01 -1.11017715e+02 -1.25539406e+02 -1.29669861e+02 -1.66457108e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02 -2.80174011e+02] .......]

iOS [[1.839091783e-315, 1.83909182e-315, 1.839091858e-315, 1.8390919e-315, 1.839091937e-315, 1.83909198e-315, 1.839092016e-315, 1.839092055e-315, 1.8390921e-315, 1.83909214e-315, 1.83909218e-315, 1.83909222e-315, 1.839092223e-315, 1.83909228e-315, 1.83909235e-315, 1.83909239e-315, 1.83909241e-315, 1.839092465e-315, 1.8390925e-315, 1.38686236e-315, 9.54278096e-316, 6.963632153146e-312, 1.22516989e-315, 1.22517421e-315, 1.83911609e-315, 1.83911613e-315, 1.83911617e-315, 1.83911621e-315, 1.839116254e-315, 1.839116284e-315, 1.83911633e-315, 1.83911637e-315, 1.83911641e-315, 1.839116447e-315, 1.839116487e-315, 1.83911653e-315, 1.83911657e-315, 1.839116576e-315, 1.839116635e-315, 1.8391167e-315, 1.839116744e-315, 1.83911676e-315, 1.83911682e-315, 1.839116847e-315, 6.96440509699e-312, 1.22517011e-315, 1.2251742e-315] ..........]

When librosa got e+02 Rosakit is e-315 ... Here are the methods used for extracting the coefficients.

image

image

Thanks

popigg avatar Nov 30 '21 11:11 popigg

Hi @popigg,

Were you able to figure out how to get the similar results as of LibRosa from Rosakit? Or did you follow some other implementation to do so? Please share as I am stuck on this for a year now.

Hi @dhrebeniuk, I've also tried to port multiple implementations of DCT-II orthogonal but none of them provides the similar results as of Python. Please suggest some way out here, I see every author has their own version of DCTs done.

rahulkumaratphilips avatar Mar 11 '22 05:03 rahulkumaratphilips

Hi, @rahulkumaratphilips. I found a solution to extract MFCCs in iOS but away from RosaKit. I used https://aubio.org/ and it worked 🚀 . I needed to train the model again with this new feature extractor. It is tough because requires to accumulate manually the MFCCs for the selected window, and in iOS the implementation is based on C++ library which looks a bit different. If you are interested in following that path I can help with a gist with some code. Good luck,

popigg avatar Mar 11 '22 08:03 popigg

Hi @popigg, Yes I'll be very much interested in trying aubio if you can please help me to get started on this. Also, I would like to know how much Aubio's MFCCs are different from Librosa's?

rahulkumaratphilips avatar Mar 11 '22 18:03 rahulkumaratphilips

@dhrebeniuk , I'm sorry but today on block post russian solders ask put my macbook and iPhone on the ground and leave it. When I get devices again I will have ability take this task in work.

dhrebeniuk avatar Mar 11 '22 20:03 dhrebeniuk

Hi @dhrebeniuk/ @popigg, I've found an implementation for DCT in python that outputs same results as that of SciPy's DCT. But, I need your help to port it into Swift as I am not that much fluent in Python. I'll really appreciate if you guys can help me with it as it's the only missing piece in my MFCC problem. The DCT implementation is -

` def dct2(x,n=None): fft = np.fft.fft x = np.atleast_1d(x) print("atLeast -", x)

if n is None:
    n = x.shape[-1]
print("n when none", n)

if x.shape[-1]<n:
    n_shape = x.shape[:-1] + (n-x.shape[-1],)
    xx = np.hstack((x,np.zeros(n_shape)))
    print("if xx -", n)
else:
    xx = x[...,:n]
    print("else xx -", n)

real_x = np.all(np.isreal(xx))
print("real_x -", real_x)

if (real_x and (np.remainder(n,2) == 0)):
    evenHStack = np.hstack( (xx[...,::2], xx[...,::-2]))
    xp = 2 * fft(np.hstack( (xx[...,::2], xx[...,::-2]) ))
    print("even hstack -", evenHStack)
    print("even xp -", xp)
else:
    oddHStack = np.hstack((xx, xx[...,::-1]))
    xp = fft(np.hstack((xx, xx[...,::-1])))
    xp = xp[...,:n]
    print("odd hstack -", oddHStack)
    print("odd xp -", xp)


w = np.exp(-1j * np.arange(n) * np.pi/(2*n))
print("w -", w)

y = xp*w
print("y -", y)

print("real_x -", real_x)
if real_x:
    print("y real -", y.real)
    return y.real
else:
    print("only y -", y)
    return y

`

rahulkumaratphilips avatar Mar 14 '22 16:03 rahulkumaratphilips

Hey @rahulkumaratphilips.

I have created these 2 gists. This is how it works for me using aubio.

swift MFCC extractor https://gist.github.com/popigg/3847a4cf71a1898e795f3fa5b8aff9a2

python MFCC extractor https://gist.github.com/popigg/de8d8db8ceb7db5adb23d58477a92e74

The aubio instalation guide can be found here https://aubio.org/manual/latest/installing.html

popigg avatar Mar 15 '22 09:03 popigg

Hey @popigg,

Thanks a lot for your support. I'll try these out and let you know.

rahulkumaratphilips avatar Mar 15 '22 09:03 rahulkumaratphilips

@popigg , @rahulkumaratphilips , Hello guys if you can send pull request with changes, please send I will approve them.

dhrebeniuk avatar Mar 15 '22 10:03 dhrebeniuk

Hi @dhrebeniuk, I have a request. I know you'd be busy with other features, but if you get time, please look into why our DCT function's output aren't matching to Python's DCT function. Because of this DCT function only our MFCC values aren't matching to that of Librosa.

rahulkumaratphilips avatar Mar 25 '22 13:03 rahulkumaratphilips

Hi @dhrebeniuk, I have a request. I know you'd be busy with other features, but if you get time, please look into why our DCT function's output aren't matching to Python's DCT function. Because of this DCT function only our MFCC values aren't matching to that of Librosa.

@rahulkumaratphilips I don't think @dhrebeniuk isn't able to work on this because they're busy with features. It's because of Russia's invasion of Ukraine.

I too am encountering some issues with MFCCs not lining up with librosa, but... @dhrebeniuk please take care of yourself and your family and make sure you're safe before you feel like you might want to contribute changes or get back to us. This thread can wait.

Thank you for RosaKit! It's been a great little library and has helped do some cool things that aren't quite covered by Apple's built-in DSP.

zac avatar Mar 25 '22 16:03 zac

Oh my bad, I didn't know @dhrebeniuk you're from Ukraine. Hoping you can find the strength to keep going, by knowing how many people around the world support you. One day soon things will be better. Our continuous thoughts of support are with you! Please stay strong and take care of yourself and loved ones the best you can.

rahulkumaratphilips avatar Mar 25 '22 16:03 rahulkumaratphilips