whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Stuck on convert_encoder while converting to CoreML

Open clyang opened this issue 1 year ago • 4 comments

I've tried to convert small model to CoreML format on Mac M1 by following the CoreML instruction.

However, the process stuck after Running MIL backend_mlprogram pipeline step. I can see ANECompilerService is using 100% cpu in top but the converting process just never ends.

My environment:

  • Macbook Pro (M1 silicon)
  • Python 3.9
  • whisper==1.1.10
  • openai-whisper==20230314
  • ane-transformers==0.1.3

clyang avatar Apr 16 '23 16:04 clyang

Same here, for the base and tiny models, it's ok.

But for small and above models, the command stuck in the 'Running MIL backend_mlprogram pipeline: 100%'.

crystoneme avatar Apr 17 '23 05:04 crystoneme

If it works for the tiny and base models, then I guess it is just taking long time to process the bigger models. I don't know what determines how long it takes. On my M1 Pro, for the medium it takes more than half hour to process, but someone reported that it takes just less than 2 minutes on their M2 Air: https://github.com/ggerganov/whisper.cpp/pull/566#issuecomment-1509936248

Maybe try to restart the computer, not sure

ggerganov avatar Apr 17 '23 09:04 ggerganov

I have successfully executed on M1 . /models/generate-coreml-model.sh large, which took about 50 minutes, it would be better to mark an approximate time in the documentation @ggerganov

ficapy avatar Apr 17 '23 09:04 ficapy

I have successfully executed on M1 . /models/generate-coreml-model.sh large, which took about 50 minutes, it would be better to mark an approximate time in the documentation @ggerganov

Good advice. Maybe it just need more time to process.

crystoneme avatar Apr 17 '23 15:04 crystoneme

I have successfully executed on M1 . /models/generate-coreml-model.sh large, which took about 50 minutes, it would be better to mark an approximate time in the documentation @ggerganov

I think it would be better to have some sort of progress update if possible. It just looks like it's hanging.

kyteague avatar Apr 17 '23 23:04 kyteague

I'v successfully converted all models, time spend from 1 min to 60 min. No errors, just need time.

System: Macbook Pro M1Pro

crystoneme avatar Apr 18 '23 01:04 crystoneme

I'v successfully converted all models, time spend from 1 min to 60 min. No errors, just need time.

System: Macbook Pro M1Pro

Can you let me know your MacOS version?

clyang avatar Apr 18 '23 17:04 clyang

Something's not quite right here. It took my M1 Max 64GB RAM exactly 4 hours to convert the base.en and almost 3 hours to convert the medium.en (which didn't load btw). Could someone share the details such as torch & python version etc?

I am using torch.2.1.0-dev20230417 python 3.10.10.

edwios avatar Apr 19 '23 18:04 edwios

FWIW I'm not entirely convinced that waiting is required or is the full answer here. I waited for multiple hours converting the medium model and it didn't finish, but if I force-quit ANECompilerService after waiting for a few mins the process appears to complete successfully.

That said, on my Mac I end up with the same issue with the converted model in Xcode, both at runtime and when performance benchmarking the model – it gets stuck compiling the model and never finishes. Sometimes if I'm lucky the compilation appears to happen immediately and I can use the model as usual for that run of the program – mostly only in Debug builds though. Seems to be a bug in the compiler service.

I have the same issues if I make a mlprogram or an mlmodel, but the mlprogram seems to show the problem more often / worse.

I have torch==2.0, python 3.10, M1 MbP, 16GB RAM

ephemer avatar May 03 '23 13:05 ephemer

I have the same problem, the 1st running works some minutes later, but when I update my system and run it for the 2nd time, the ANECompilerService will continue to run for 10 hours, but when I force quit it, the main bin will continue to run and give me the correct result.

FWIW I'm not entirely convinced that waiting is required or is the full answer here. I waited for multiple hours converting the medium model and it didn't finish, but if I force-quit ANECompilerService after waiting for a few mins the process appears to complete successfully.

That said, on my Mac I end up with the same issue with the converted model in Xcode, both at runtime and when performance benchmarking the model – it gets stuck compiling the model and never finishes. Sometimes if I'm lucky the compilation appears to happen immediately and I can use the model as usual for that run of the program – mostly only in Debug builds though. Seems to be a bug in the compiler service.

I have the same issues if I make a mlprogram or an mlmodel, but the mlprogram seems to show the problem more often / worse.

I have torch==2.0, python 3.10, M1 MbP, 16GB RAM

RogerPu avatar May 15 '23 14:05 RogerPu

I experienced the same problem. As soon as I force-quit ANECompilerService, the conversion was completed quickly. (model : large / ~2min with M1 Pro 14 16GB ram)

I also encountered the same issue when loading the model, but once again, the model loaded successfully right after force-quit ANECompilerService.

hoonlight avatar May 22 '23 05:05 hoonlight

It's glad to know I'm not the only one having this issue.

clyang avatar May 22 '23 06:05 clyang

I experienced the same problem. As soon as I force-quit ANECompilerService, the conversion was completed quickly. (model : large / ~2min with M1 Pro 14 16GB ram)

I also encountered the same issue when loading the model, but once again, the model loaded successfully right after force-quit ANECompilerService.

same here

cnsilvan avatar May 25 '23 18:05 cnsilvan

same here. 68 minutes, then I found solution to kill ANECompilerService and it finished

arrowcircle avatar May 30 '23 11:05 arrowcircle

@archive-r @arrowcircle

However, even after killing ANECompilerService, the same issues occur when you run it again.

I have successfully used the base option and experienced no issues during subsequent runs. However, when I tried using the medium option, it became stuck.

Erimus-Koo avatar Jun 30 '23 10:06 Erimus-Koo

I am having the exact same issue - if I kill ANECompilerService, the COREML compiled main continues on and begins to work. What is the issue here? It SEEMS to be recompiling the ml each time.

janngobble avatar Jul 10 '23 20:07 janngobble

Same. Tried generate-coreml-model.sh a few times with both medium.en and large, even let it run 8h or so overnight, never completed.

After sudo kill -9 on ANECompilerService (sending SIGTERM didn't work) , the process finished almost immediately.

Afterwards, running the model hangs indefinitely at whisper_init_state: first run on a device may take a while ...

If I again send SIGKILL to ANECompilerService, it finishes within seconds and correctly transcribes the audio.

n8henrie avatar Jul 27 '23 13:07 n8henrie

Killing ANECompilerService with -9 works. But I must do this every single time i start a normal transcription. This can't be normal...

Any suggestions?

carstenuhlig avatar Aug 07 '23 11:08 carstenuhlig

Killing ANECompilerService with -9 works. But I have to do this every single time i start a normal transcription. This can't be normal...

Any suggestions?

carstenuhlig avatar Aug 07 '23 11:08 carstenuhlig

Killing ANECompilerService with -9 works. But I have to do this every single time i start a normal transcription. This can't be normal...

Any suggestions?

the same situation, MBP M1 max, 32 GB, mac os 13.5.1

eual8 avatar Sep 07 '23 15:09 eual8

Same thing was happening to me with any of the downloadable ones I could find, building it locally on my machine worked fine though. Took ~10m for small.en on my M1 MBP

philk avatar Sep 07 '23 16:09 philk

Same thing was happening to me with any of the downloadable ones I could find, building it locally on my machine worked fine though. Took ~10m for small.en on my M1 MBP

How did you do this? using the convert-pt-to-ggml.py script?

eual8 avatar Sep 09 '23 09:09 eual8

I was able to convert the whisper large-v2.pt model (it took less than 1 minute). Now there are no errors, thanks for the advice. I did everything as written here: https://github.com/ggerganov/whisper.cpp/blob/master/models/README.md

eual8 avatar Sep 09 '23 12:09 eual8

Used the instructions here https://github.com/ggerganov/whisper.cpp/pull/566 under the Usage section

philk avatar Sep 09 '23 19:09 philk

The process is stuck on large model and it's been over 10 hours. Force kill ANECompilerService in the Activity Monitor (Memory tab) helped me.

Sogl avatar Sep 16 '23 08:09 Sogl