WhisperKit Error on using self converted models

I converted the BELLE-2/Belle-whisper-large-v3-zh model from PyTorch to coreml using whisperkiltools generate. After successfully converted to coreML format, I use the cli to transcribe. However, it stuck on initialize models. My device is Mac mini M4 Pro. the output is below: "whisperkit-cli" transcribe --language zh --audio-path "/Users/tiancheng/Downloads/20min.mp3" --model-path "/Users/tiancheng/AI_Models/whisperkit-coreml/Belle-whisper-large-v3-turbo-zh_667MB" --concurrent-worker-count 0 --report-path "/Users/tiancheng/Downloads/results" --report Error: Tokenizer is unavailable

I copied the original tokenizer.json to the model folder, But it still stuck on initializing.

Dec 08 '24 07:12 OutisLi

@OutisLi Are you on an M1 device by any chance? *turbo* models are incompatible with M1 devices but you can still generate non-turbo models with --audio-encoder-sdpa-implementation Cat while executing whisperkit-generate-model. The device compatibility map is published here and our TestFlight app demonstrates how to leverage this file (or a similar file) in your app.

Dec 09 '24 05:12 atiorh

@OutisLi Are you on an M1 device by any chance? *turbo* models are incompatible with M1 devices but you can still generate non-turbo models with --audio-encoder-sdpa-implementation Cat while executing whisperkit-generate-model. The device compatibility map is published here and our TestFlight app demonstrates how to leverage this file (or a similar file) in your app.

I first try this model on macbook pro with m1pro and got this error. Then I use Mac mini with M4Pro, the non-quantified turbo can finally run. The convert command is :whisperkit-generate-model --model-version BELLE-2/Belle-whisper-large-v3-turbo-zh --output-dir /Users/outisli/Downloads --generate-quantized-variants --generate-decoder-context-prefill-data

However, when I tried this converted model using cli through subprocess.run using python. It took a long time to initialize the model every time I run, even after the first run. Meanwhile the --generate-quantized-variants will generate a 520M model, but the result is :Transcription of 20min.mp3:

.com. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .com. . . . . . . .。 . . . . . I don't.

while the full turbo model seems right

Dec 09 '24 05:12 OutisLi

The default quantization recipe may not work on every model out of the box. e.g. 520MB is pretty aggressive for large-v3-turbo. I recommend tuning the compression parameters to get closer to 620MB for this particular model (based on our experience). Feel free to drop by our Discord for help: https://discord.gg/G5F5GZGecC

Dec 09 '24 11:12 atiorh