coremltools
coremltools copied to clipboard
MFCC differences between torchaudio and CoreML
🐞Describing the bug
I want to train a model that uses MFCCs as the input. Luckily CoreMLTools is able to convert torchaudio's MFCC :) but there are some numerical differences. I assume those differences arise because CoreML/torchaudio use different parameters (such as FFT size, the number of mel frequencies and so on). I think it is mainly influenced by higher frequencies (reducing torchaudio's n_mels decreases the difference a bit).
What are the recommended parameters to minimize the discprenacy between CoreML/torchaudio's MFCC?
To Reproduce
import torch
import torchaudio
import coremltools
import numpy as np
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.mfcc = torchaudio.transforms.MFCC()
def forward(self, wav):
return self.mfcc(wav)
x, fs = torchaudio.load("test.wav", normalize=True)
model = Model()
model.eval()
model = torch.jit.trace(model, x)
y = model(x).numpy()
core_model = coremltools.convert(
model, convert_to="mlprogram", inputs=[coremltools.TensorType(shape=x.shape)]
)
core_model.save("newmodel.mlpackage")
core_y = core_model.predict({"wav": x.numpy()})
difference = np.abs(next(iter(core_y.values())) - y).mean()
print(difference) # 0.04909986
System environment (please complete the following information):
- coremltools version: 72
- OS (e.g. MacOS version or Linux type): MacOs 14.4.1 (M2)
- Any other relevant version information: torch 2.2.0