Anton Marini

Results 157 comments of Anton Marini

Ok, here's a deep dive on the state of the Log Mel Spectrogram code. # The following are the custom implementations * `FFT.swift` exposes both a real and complex FFT...

Oh amazing @VimalMollyn - I really appreciate any insight. I'm not an audio guy, and I know from some reading that Apples vDSP / FFT implementation does some funky interleaving...

OOOOOOOkkaaay. So after a long horrifying look into this, I think for now this is off the table unless someone with way more free time on their hands wants to...

In theory, a nice to have would be to just: * extract the forward real to complex Pocket FFT strategy * match the divide and conquor approach using vDSP's new...

Oh, I should state, the current state of our Mel work with vDSP is much much closer than before, albeit with numerically incorrect output which I think is due to...

If I, or anyone else actually does any of the above work to match PocketFFT output with vDSP's DFT code, ideally it should be packaged into MatFT which is a...

The Python STFT / Mel code is just the following taken from the OpenAI Whisper code base: https://github.com/openai/whisper/blob/main/whisper/audio.py#L115 ``` def log_mel_spectrogram(audio: Union[str, np.ndarray, torch.Tensor], n_mels: int = N_MELS): """ Compute...

I also have a test Google Colab to go through and debug results: https://colab.research.google.com/drive/1r9ghakH8__jGqGiYHC2DXtKaW_ozdSrV?usp=sharing

Here is the audio file I have been testing with and the precomputed mel filters taken from Whisper which are loaded in the collab: [mel_filters.npz.zip](https://github.com/vade/OpenAI-Whisper-CoreML/files/10609554/mel_filters.npz.zip) [A045_C001_0603BW_analyzed-Mono 16kHz Float PCM.wav.zip](https://github.com/vade/OpenAI-Whisper-CoreML/files/10609556/A045_C001_0603BW_analyzed-Mono.16kHz.Float.PCM.wav.zip)

Yea, I was thinking the same. I naively tried an experiment to pad to 0 to the next pow2 FFT size and ran vdsp zrip on the results and compared...