WhisperKit Support MLX in WhisperAX

Add support for the WhisperAX example app, as well as various refactors and cleanup.

Note: this is using an unreleased version of MLX, pending merge of https://github.com/ml-explore/mlx-swift/pull/130

There are still some memory issues to address, will look more into this soon.

CoreML: Screenshot 2024-09-06 at 11 28 23 PM MLX:

Important note, first time running this will likely throw an error about PrepareMetalShaders, which requires Trust & Enable on this popup when selecting the error.

Sep 07 '24 09:09 ZachNagengast

Hey all, just a heads-up, the example project gets this SPM error:

CleanShot 2024-09-18 at 10 34 34@2x

Sep 18 '24 08:09 iandundas

Hey all, just a heads-up, the example project gets this SPM error:

that fork was merged and is now the 0.16.2 tag on https://github.com/ml-explore/mlx-swift

Sep 18 '24 13:09 davidkoski

Awesome, thanks for the update @davidkoski!

Sep 18 '24 14:09 ZachNagengast

@ZachNagengast - This looks awesome! Apologies - rookie question... It looks like currently the MLX repo only has the base and tiny models. How hard is it for mere mortals to "build" some of the larger models for testing this out with larger models?

Sep 28 '24 22:09 latenitefilms

Or... can you use existing models with MLX?

Oct 04 '24 01:10 latenitefilms

Sorry missed your original message! We will fill in the mlx repo with the remaining models as part of this release, we just made these copies for consistency with our swift package. Any MLX whisper model currently existing with the same naming scheme will work in theory 👍 @jkrukowski may be able to confirm or deny.

Oct 04 '24 01:10 ZachNagengast

Legend, thanks @ZachNagengast! So basically, we do need new models for MLX, we can use the existing WhisperKit models? They need to be optimised or something?

Oct 04 '24 01:10 latenitefilms

Yep the existing WhisperKit models are optimized for CoreML, the ones in this repo we will fill out with the equivalent weights that are compatible with this MLX PR

Oct 04 '24 01:10 ZachNagengast

Sorry for all the rookie questions, but when you say "optimised for CoreML" - does this mean they ONLY work on CoreML, or can you use these CoreML models in MLX and they're just not as fast/accurate?

Apologies - this whole Whisper world is very new to me, so I very much appreciate all your wisdom and support!

Oct 04 '24 02:10 latenitefilms

Yep the existing WhisperKit models are optimized for CoreML, the ones in this repo we will fill out with the equivalent weights that are compatible with this MLX PR

@ZachNagengast Is there any model conversion script we can run, or any other source we can use, in order to create/obtain more MLX compatible model versions?

Oct 04 '24 05:10 maxlund

@latenitefilms Yes the .mlmodelc models only work with CoreML at the moment. @maxlund There is a script made by @jkrukowski to do the conversion here https://github.com/argmaxinc/WhisperKit/pull/169, we'll integrate this into https://github.com/argmaxinc/whisperkittools in the future.

Oct 04 '24 07:10 ZachNagengast

Legend, thanks so much @ZachNagengast! Do you have a rough/ballpark ETA of when you're hoping to finish and merge in MLX support? No rush or pressure - just wondering if it's worth trying to convert our own models or not.

Let me know if there's anything I can do to help with MLX testing/release! Would love to see this in action ASAP!

Thanks for EVERYTHING you do! Appreciate it!

Oct 04 '24 22:10 latenitefilms

There are just a few optimizations to fix up to make it ready for release, specifically memory usage. Current issues are:

MLX does not require a prewarm stage, so it should skip that. Currently its loading the model twice without freeing up the memory. Can also be solved by setting a cache limit or clearing the cache after load
KV cache should use this instead of mlmultiarrays
Sampling can be compiled for some easy speedups
Attention should use SDPA instead of current logic

These are paraphrased from @davidkoski and @awni

Will be revisiting this after the upcoming release but feel free to test with this current branch if you see any other potential speedup besides these, all the interfaces should be the same in its final form, just faster and more memory efficient with these changes.

Oct 05 '24 21:10 ZachNagengast

Amazing! Thanks so much! Will test out and let you know if I break anything.

Oct 05 '24 22:10 latenitefilms

Any update on this? Is it usable? @latenitefilms @ZachNagengast ? Can someone please point me in the direction of how i can get this set up?

Nov 22 '24 23:11 anishjain123

Hi @anishjain123 these are still pending issues https://github.com/argmaxinc/WhisperKit/pull/200#issuecomment-2395207140, but this branch is technically usable. We'd like to resolve the perf and memory issues before merging, which is still a high priority for us! Working on a refactor right now to allow various different model input and output types, including MLXArray, which should help with the issues converting between MLMultiArray and MLXArray.

Nov 23 '24 00:11 ZachNagengast

In order to consolidate a bit, I'll merge this PR into the parent branch so that there is one PR to manage the final changes in. Will link the relevant discussions there. If anyone has made progress on any of the topics discussed here please post in #124, there is some updating to do with the latest from main as well as mlx swift.

Feb 02 '25 17:02 ZachNagengast

@anishjain123 there is also this alternative, as you probably know already: mlx-whisper

@ZachNagengast what would be the differences between running Whisper via MLX using your implementation vs the mlx-whisper version (see link above)?

Feb 05 '25 12:02 maxlund