whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Core ML support

Open ggerganov opened this issue 1 year ago β€’ 6 comments

Running Whisper inference on Apple Neural Engine (ANE) via Core ML

Huge thanks to @wangchou for demonstrating how to use Core ML and making the initial port: https://github.com/ggerganov/whisper.cpp/discussions/548

WIP IN PROGRESS Everything in this branch is subject to change

Currently, we have the Encoder fully running on the ANE through Core ML inference. The performance gain seems to be more than x3 compared to 8-thread CPU (tested for tiny, base and small models).

Here are initial performance benchmarks for the Encoder with (top) and without (bottom) Core ML:

CPU OS Config Model Th Load [ms] Encode [ms] Commit
MacBook M1 Pro MacOS 13.2.1 CORE ML tiny 4 50 30 b0ac915
MacBook M1 Pro MacOS 13.2.1 CORE ML base 4 74 64 b0ac915
MacBook M1 Pro MacOS 13.2.1 CORE ML small 4 188 208 b0ac915
MacBook M1 Pro MacOS 13.2.1 CORE ML medium 4 533 1033 b0ac915
MacBook M1 Pro MacOS 13.2.1 CORE ML large 4 ? ? b0ac915
---
MacBook M1 Pro MacOS 13.0.1 NEON BLAS tiny 8 71 102 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS base 8 96 220 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS small 8 233 685 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS medium 8 603 1928 206fc93
MacBook M1 Pro MacOS 13.0.1 NEON BLAS large 8 1158 3350 206fc93
---

Usage

  • Download the Core ML encoder .mlmodel and compile it to .mlmodelc:

    ./models/download-coreml-model.sh base.en
    xcrun coremlc compile ./models/ggml-base.en.mlmodel ./models
    

    The .mlmodel files are currently hosted at:

    https://huggingface.co/datasets/ggerganov/whisper.cpp-coreml

  • Build whisper.cpp with Core ML support:

    # using Makefile
    make clean
    WHISPER_COREML=1 make -j
    
    # using CMake
    cd build
    cmake -DWHISPER_COREML=1 ..
    
  • Run the examples as usual. The first run on a device is slow, since the ANE service compiles the Core ML model to some device-specific format. Next runs are faster.

TODO

  • [ ] Can the Decoder be ported to ANE too? https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5199310
  • [ ] Convert the medium and large models to Core ML format and upload to HF Need a Mac Silicon with 64GB RAM to do the conversion from PyTorch -> Core ML
  • [ ] Unified ggml + coreml model file We currently load both the full ggml model (encoder + decoder) and the coreml encoder - not optimal
  • [ ] Scripts for generating Core ML model files (e.g. https://github.com/wangchou/callCoreMLFromCpp)
  • [ ] Support loading Core ML model from memory buffer Currently we support only loading from a folder on the disk
  • [ ] Progress report for initial-run model processing
  • [ ] Adjust memory usage buffers when using Core ML
  • [ ] Try to avoid the first on-device automatic model generation (it takes a long time)
  • [ ] The medium model takes more than 30 minutes to convert on the first run. Is there a work-around?
  • [ ] Can we run the Core ML inference on the GPU?

ggerganov avatar Mar 05 '23 09:03 ggerganov

Great work!

I tested coreml branch on Mac Mini M2 (base $599 model).

The performance gain seems to be more than x5 compared to 4-thread CPU (thanks to much faster ANE on M2, 8-thread CPU on Mac Mini M2 base model is slower than 4-thread).

Performance benchmarks for the Encoder with (top) and without (bottom) Core ML:

CPU OS Config Model Th Load Enc. Commit
Mac Mini M2 macOS 13.2.1 CORE ML tiny 4 44 25 17a1459
Mac Mini M2 macOS 13.2.1 CORE ML base 4 66 54 17a1459
Mac Mini M2 macOS 13.2.1 CORE ML small 4 163 190 17a1459
Mac Mini M2 macOS 13.2.1 CORE ML medium 4 17a1459
Mac Mini M2 macOS 13.2.1 CORE ML large 4 17a1459

CPU OS Config Model Th Load Enc. Commit
Mac Mini M2 macOS 13.2.1 NEON BLAS tiny 4 40 142 59fdcd1
Mac Mini M2 macOS 13.2.1 NEON BLAS base 4 67 299 59fdcd1
Mac Mini M2 macOS 13.2.1 NEON BLAS small 4 152 980 59fdcd1
Mac Mini M2 macOS 13.2.1 NEON BLAS medium 4 59fdcd1
Mac Mini M2 macOS 13.2.1 NEON BLAS large 4 59fdcd1

brozkrut avatar Mar 06 '23 16:03 brozkrut

I compiled whisper.cpp with coreml support using make as well I build the mlmodel but I'm getting an error

whisper_init_from_file: loading model from 'models/ggml-base.en.mlmodelc'
whisper_model_load: loading model
whisper_model_load: invalid model data (bad magic)
whisper_init: failed to load model
error: failed to initialize whisper context

Is there anything else I'm missing? πŸ€”

DontEatOreo avatar Mar 09 '23 11:03 DontEatOreo

@DontEatOreo

On the command line, you still have to specify the non-coreml model: models/ggml-base.en.bin. The code will automatically also load the models/ggml-base.en.mlmodelc if it is present in the same folder.

ggerganov avatar Mar 09 '23 11:03 ggerganov

@ggerganov Благодаря Ρ‚ΠΈ! I was very confused why it wasn't working even though I did everything right

DontEatOreo avatar Mar 09 '23 12:03 DontEatOreo

This is great. Excited to see how this feature develops. Leveraging ANE would be huge, even more if the decoder was possible to port to it.

dennislysenko avatar Mar 22 '23 19:03 dennislysenko

Just saw this was announced, is it useful? https://github.com/apple/ml-ane-transformers

strangelearning avatar Mar 24 '23 17:03 strangelearning

@DontEatOreo

On the command line, you still have to specify the non-coreml model: models/ggml-base.en.bin. The code will automatically also load the models/ggml-base.en.mlmodelc if it is present in the same folder.

Does this mean we have to bundle both files with the app? Asking since the file size gets fairly large having to include them all.

cerupcat avatar Apr 05 '23 18:04 cerupcat

Hey, thanks for this awesome project! I am trying to run the whisper.objc example with CoreML but running into some issues. Has someone successfully done this and could guide me on how to set it up?

lucabeetz avatar Apr 14 '23 14:04 lucabeetz

@DontEatOreo On the command line, you still have to specify the non-coreml model: models/ggml-base.en.bin. The code will automatically also load the models/ggml-base.en.mlmodelc if it is present in the same folder.

Does this mean we have to bundle both files with the app? Asking since the file size gets fairly large having to include them all.

The solution is to produce encoder-only CoreML model in one file and decoder-only standard model in another file. This is not very difficult to achieve, but supporting so many model files might get too difficult for me. So probably I will rely on someone helping out and demonstrating how this can be done, either as an example in this repo or in a fork.

ggerganov avatar Apr 14 '23 17:04 ggerganov

This is getting almost ready to merge. I am hoping to do it tomorrow.

The most important part that currently needs testing is the creation of the CoreML models, following the instructions here:

https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5327027

If you give this a try, please let us know the results and if you encountered any issues. Also, lets us know if you used quantized or not-quantized CoreML models and what has been the experience.

I believe that tiny, base and small models should be supported, while medium and large seem to not be viable for this approach.

ggerganov avatar Apr 14 '23 19:04 ggerganov

1.4gb for medium sounds fine for users, but you're saying there are other limitations against it?

aehlke avatar Apr 14 '23 20:04 aehlke

@aehlke The scripts for generating Core ML models, support all sizes, but on my M1 Pro, it takes very long time (i.e. more than half an hour) to generate the medium model. After that, the first run is also very slow. Next runs are about 2 times faster compared to CPU-only.

In any case, you can follow the instructions in this PR and see how it works on your device.

ggerganov avatar Apr 15 '23 09:04 ggerganov

CPU OS Config Model Th Load Enc. Commit
MacBook Air M2 MacOS 13.3.1 NEON BLAS COREML tiny 4 41 31 f19e23f
MacBook Air M2 MacOS 13.3.1 NEON BLAS COREML base 4 59 57 f19e23f
MacBook Air M2 MacOS 13.3.1 NEON BLAS COREML small 4 147 195 f19e23f
MacBook Air M2 MacOS 13.3.1 NEON BLAS COREML medium 4 576 783 f19e23f
MacBook Air M2 MacOS 13.3.1 NEON BLAS COREML large 4 1196 2551 f19e23f

Great work! It was consuming ~9.7GB (short peak 15.03GB) memory converting large model to ML model, it worked fine on 8GB Air.

Edit: I measured time of COREML model conversion and first loading conversion time (second-first).

Model COREML conv First Loading conv (sec)
tiny 4.915 0.72
base 8.564 1.34
small 26.050 4.72
medium 1:35.85 15.57
large 3:43.32 35.10

neurostar avatar Apr 15 '23 19:04 neurostar

When running this script:

./models/generate-coreml-model.sh base.en

I got the error:

xcrun: error: unable to find utility "coremlc", not a developer tool or in PATH

CarberryChai avatar Apr 16 '23 04:04 CarberryChai

Is it me or the link of CoreML models is missing on Hugging Face?

Btw, @ggerganov, if you need help converting the models, I'd be glad to contribute. It seems to me that it only needs to be done once. :)

flexchar avatar Apr 16 '23 09:04 flexchar

For now, you should generate the Core ML models locally following the instructions. I don't want to host them on HF yet, because it is very likely that the models will change soon - there are a some pending improvements (see https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5622733). If I upload them now, later we will get new models and everyone will be confused which model they are using, etc.

ggerganov avatar Apr 16 '23 10:04 ggerganov

In that regard I'd like to ask for help since I cant seem to succeed with it..

python3.10 ./models/convert-whisper-to-coreml.py --model tiny

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 72.1M/72.1M [00:05<00:00, 14.3MiB/s]
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=384, n_audio_head=6, n_audio_layer=4, n_vocab=51865, n_text_ctx=448, n_text_state=384, n_text_head=6, n_text_layer=4)
/opt/homebrew/lib/python3.10/site-packages/whisper/model.py:166: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"
/opt/homebrew/lib/python3.10/site-packages/whisper/model.py:97: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  scale = (n_state // self.n_head) ** -0.25
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 367/368 [00:00<00:00, 6681.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 1047.63 passes/s]
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 57/57 [00:00<00:00, 147.77 passes/s]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 2599.51 passes/s]
Traceback (most recent call last):
  File "/Users/luke/dev/whisper.cpp/./models/convert-whisper-to-coreml.py", line 331, in <module>
    decoder = convert_decoder(hparams, decoder, quantize=args.quantize)
  File "/Users/luke/dev/whisper.cpp/./models/convert-whisper-to-coreml.py", line 283, in convert_decoder
    traced_model = torch.jit.trace(model, (token_data, audio_data))
  File "/opt/homebrew/lib/python3.10/site-packages/torch/jit/_trace.py", line 741, in trace
    return trace_module(
  File "/opt/homebrew/lib/python3.10/site-packages/torch/jit/_trace.py", line 958, in trace_module
    module._c._create_method_from_trace(
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/model.py", line 211, in forward
    x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/model.py", line 138, in forward
    x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)[0]
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/model.py", line 83, in forward
    k = self.key(x if xa is None else xa)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/model.py", line 37, in forward
    return F.linear(
RuntimeError: mat1 and mat2 shapes cannot be multiplied (384x1500 and 384x384)

flexchar avatar Apr 16 '23 11:04 flexchar

When running this script:

./models/generate-coreml-model.sh base.en

I got the error:

xcrun: error: unable to find utility "coremlc", not a developer tool or in PATH

I was able to resolve by sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

neurostar avatar Apr 16 '23 13:04 neurostar

Hi, which version of Python should I use to install these dependencies? I tried 3.11 and 3.10, but failed to install all dependcecies.

pip install ane_transformers
pip install openai-whisper
pip install coremltools

flyisland avatar Apr 17 '23 02:04 flyisland

Hi, which version of Python should I use to install these dependencies? I tried 3.11 and 3.10, but failed to install all dependcecies.

pip install ane_transformers
pip install openai-whisper
pip install coremltools

https://github.com/openai/whisper/discussions/906#discussioncomment-4803242 My computer has both Python 3.9 and 3.11 installed. After setting the default configuration to 3.9, I still couldn't find the whisper module and had to uninstall Python 3.11 to make it work. This indicates that pip needs to be fully linked to a Python version below 3.10 to function properly.

adolphnov avatar Apr 17 '23 03:04 adolphnov

@adolphnov and @flyisland I have no idea how these Python versions work. I'm just using whatever is default on my M1. You can give me some commands I can run to tell you the versions that I have, or send a PR to improve the setup process.

@flexchar You are running the wrong script. Use ./models/generate-coreml-model.sh tiny as specified in the instructions

ggerganov avatar Apr 17 '23 09:04 ggerganov

Thank you G. To clarify for others, I also ran into the xcrun: error: unable to find utility "coremlc", not a developer tool or in PATH problem.

I didn't know (new to Mac) that I had to install Xcode. I also had trouble installing pip packages on python3.11 (latest at the time of writing this). So I purged python and fresh brew install [email protected]. Then I had to add this to my shell file export PATH="/opt/homebrew/opt/[email protected]/libexec/bin:$PATH" and then converting worked.

This is awesome.

Georgi, you should consider sponsor button on this repo. I believe there are many that appreciate your work. Thank you for doing this.

Ps. It took 2.5 min to convert the largest model. It's going really smooth.

flexchar avatar Apr 17 '23 10:04 flexchar

Yes 3.11 fails for me as well during installing one of the package via pip, but 3.10.x should work (although converting "large" got stuck on my M1 Pro for hours so I had to force quit it; will try again later to see how it goes since it seems to work for others here).

For managing Python versions, you can also use managers such as pyenv or asdf to do so. You can set a local version so it always uses 3.10.x when you enter whisper.cpp directory, and use some other version elsewhere.

wzxu avatar Apr 17 '23 10:04 wzxu

@ggerganov Thanks, I managed to install those dependencies using Python 3.9, but ran into the xcrun: error: unable to find utility "coremlc", not a developer tool or in PATH problem.

Do I need to install the Full Xcode Package to have the "coremlc"?

flyisland avatar Apr 17 '23 13:04 flyisland

@flyisland as I've mentioned in my reply, yes, you need to install Xcode.

https://apps.apple.com/us/app/xcode/id497799835

flexchar avatar Apr 17 '23 13:04 flexchar

My XCode installation was pointing at a wrong direction, so I used: sudo xcode-select --reset to resolve the missing coremlc problem. In practice it sets the active developer directory path to the same as mentioned earlier (/Applications/Xcode.app/Contents/Developer). You can check the current path with: xcode-select -p

sriver avatar Apr 17 '23 14:04 sriver

Hi @ggerganov, thanks for merging the CoreML branch into master!

I'm seeing a 10%-13% performance drop though, is that expected?

Running bench with

$ ./build-1.3.0/bin/bench -m models/ggml-small.en.bin
$ ./build-coreml/bin/bench -m models/ggml-small.en.bin

on MacBook Pro M1 Max:

Coreml branch [ms] V1.3.0 [ms]
598 669
591 680
580 671

Thanks :)

bjnortier avatar Apr 17 '23 14:04 bjnortier

Yes 3.11 fails for me as well during installing one of the package via pip, but 3.10.x should work (although converting "large" got stuck on my M1 Pro for hours so I had to force quit it; will try again later to see how it goes since it seems to work for others here).

Still get stuck at this step for eternity with no more output whatsoever. πŸ˜• Can't interrupt with ctrl-C either so had to quit terminal and force quit ANECompilerService. Any idea what may be the cause? M1 Pro with 16GB should be sufficient…? SCR-20230418-biqs

wzxu avatar Apr 17 '23 17:04 wzxu

Thank you, @ggerganov and @flexchar. With Python 3.9 and the full Xcode package installed on my laptop, it is now working. I can see the message "whisper_init_state: loading Core ML model from 'models/ggml-base.en-encoder.mlmodelc'" in the output of both the ./main and ./stream programs.

flyisland avatar Apr 17 '23 23:04 flyisland

Everything is working, except I'm getting the whisper_init_state: first run on a device may take a while ... notice (and resulting 15–30 minute wait) on every run. Is there some way around this?

ecormany avatar Apr 18 '23 20:04 ecormany