WhisperKit Unable to load model (or very very slow)

Hi,

Loading models on the IOS. However, when I try to load it on my developer Iphone (IOS17, iphone 13 pro), it doesn't work. It seems to get stuck on Loading audio encoder. I'm loading openai_whisper-large-v3-v20240930_turbo.

I'm also getting: ANE model load has failed for on-device compiled macho. Must re-compile the E5 bundle. @ GetANEFModel E5RT: ANE model load has failed for on-device compiled macho. Must re-compile the E5 bundle. (13) E5RT encountered an STL exception. msg = MILCompilerForANE error: failed to compile ANE model using ANEF. Error=Couldn’t communicate with a helper application.. E5RT: MILCompilerForANE error: failed to compile ANE model using ANEF. Error=Couldn’t communicate with a helper application. (11) [Espresso::handle_ex_plan] exception=ANECF error: failed to load ANE model file:///private/var/mobile/Containers/Data/Application/4D70A0C9-753D-4885-B517-1553E5A1F338/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-large-v3-v20240930_turbo/AudioEncoder.mlmodelc/model.mil Error=createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:skipPreparePhase:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:owningPid:cacheUrlIdentifier:aotCacheUrlIdentifier:error:: Program load failure (0x20004)

How long is the expected loading time on 13 pro ? Is there any build configuration I'm missing ?

Thanks,

Hugo

Dec 03 '24 18:12 HugoDellinger

Some models are simply too large to load on an iPhone 13 Pro, this one in particular is one of them. Luckily we've spent a lot of effort quantizing and optimizing many of the biggest models to work on the smallest devices, and created a list for all the models that are supported per device. It is directly accessible in WhisperKit via this handy function, and the example app also shows how they can be utilized here.

This method pulls from https://huggingface.co/argmaxinc/whisperkit-coreml/blob/main/config.json#L40-L57, which we have done extensive benchmarks to verify performance and functionality - viewable in this huggingface space: https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks. This will help you choose the best accuracy / performance tradeoff for all the phones we support. Hope that helps!

Dec 03 '24 19:12 ZachNagengast

I have the same issue I'm trying to use this model distil-whisper_distil-large-v3 although I'm on a M1 Max 32GB Mac which should be fairly quick

Jan 04 '25 02:01 neo773

I think for the larger models the first-time loading time for the audio encoder will be pretty long, no matter how powerful the processor as the system still has to compile the model for the ANE, which seems to be a single-threaded process (from what I can tell so far)

Jan 09 '25 23:01 sunflsks

I’m pretty sure it’s stuck infinitely I left it hoping it would load but it never does.

Jan 09 '25 23:01 neo773

To be fair it took almost 10 minutes for openai_whisper-large-v3-v20240930_turbo_632MB to load on my M1 Pro, (and it took far less time to load openai-whisper-tiny) so considering you're loading the full model it might take far far longer. But I'm also not familiar with the internals of ANE, so who knows 🤷

(Side note; I checked in powermetrics and it appears the ANE compiler service is not single-threaded. However it runs solely on the efficiency cores of the processor, which explains a lot! Quite silly in my opinion but I guess it makes sense for the majority of use cases. Oh well)

Jan 10 '25 00:01 sunflsks