Anton Marini comments

Results 157 comments of


                                            Anton Marini

Generate Log Mel Spectrograms with vDSP natively

Ok, here's a deep dive on the state of the Log Mel Spectrogram code. # The following are the custom implementations * `FFT.swift` exposes both a real and complex FFT...

Generate Log Mel Spectrograms with vDSP natively

Oh amazing @VimalMollyn - I really appreciate any insight. I'm not an audio guy, and I know from some reading that Apples vDSP / FFT implementation does some funky interleaving...

Generate Log Mel Spectrograms with vDSP natively

OOOOOOOkkaaay. So after a long horrifying look into this, I think for now this is off the table unless someone with way more free time on their hands wants to...

Generate Log Mel Spectrograms with vDSP natively

In theory, a nice to have would be to just: * extract the forward real to complex Pocket FFT strategy * match the divide and conquor approach using vDSP's new...

Generate Log Mel Spectrograms with vDSP natively

Oh, I should state, the current state of our Mel work with vDSP is much much closer than before, albeit with numerically incorrect output which I think is due to...

Generate Log Mel Spectrograms with vDSP natively

If I, or anyone else actually does any of the above work to match PocketFFT output with vDSP's DFT code, ideally it should be packaged into MatFT which is a...

Generate Log Mel Spectrograms with vDSP natively

The Python STFT / Mel code is just the following taken from the OpenAI Whisper code base: https://github.com/openai/whisper/blob/main/whisper/audio.py#L115 ``` def log_mel_spectrogram(audio: Union[str, np.ndarray, torch.Tensor], n_mels: int = N_MELS): """ Compute...

Generate Log Mel Spectrograms with vDSP natively

I also have a test Google Colab to go through and debug results: https://colab.research.google.com/drive/1r9ghakH8__jGqGiYHC2DXtKaW_ozdSrV?usp=sharing

Generate Log Mel Spectrograms with vDSP natively

Here is the audio file I have been testing with and the precomputed mel filters taken from Whisper which are loaded in the collab: [mel_filters.npz.zip](https://github.com/vade/OpenAI-Whisper-CoreML/files/10609554/mel_filters.npz.zip) [A045_C001_0603BW_analyzed-Mono 16kHz Float PCM.wav.zip](https://github.com/vade/OpenAI-Whisper-CoreML/files/10609556/A045_C001_0603BW_analyzed-Mono.16kHz.Float.PCM.wav.zip)

Generate Log Mel Spectrograms with vDSP natively

Yea, I was thinking the same. I naively tried an experiment to pad to 0 to the next pow2 FFT size and ran vdsp zrip on the results and compared...