whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

How to ggml-ify other fine-tuned whisper models?

Open haqatak opened this issue 1 year ago • 1 comments

Hi, I would love to use this model https://huggingface.co/pere/whisper-NST2. I tried point it at the pytorch_model.bin file, but I receive an error: Traceback (most recent call last): File "convert-pt-to-ggml.py", line 209, in hparams = checkpoint["dims"] KeyError: 'dims'

Can someone point me in the right direction? thanks :)

haqatak avatar Mar 15 '23 22:03 haqatak

Follow instructions from here https://github.com/ggerganov/whisper.cpp/tree/master/models#fine-tuned-models So, assuming you have whisper and whisper.cpp directories and you are in whisper directory:

git clone https://huggingface.co/pere/whisper-NST2 (you already have the .bin file so just put it in the directory) python3 models/convert-h5-to-ggml.py whisper-NST2 ../whisper custom

The last parameter (custom) is just a name of the directory where I keep my custom models. After a minute, you will have a file named custom/ggml-model.bin and you can run

./main -f input.wav -m custom/ggml-model.bin -l your_language

And that's it.

ottokiksmaler avatar Mar 16 '23 09:03 ottokiksmaler

Perfect, that worked thanks for the swift reply :D

haqatak avatar Mar 16 '23 19:03 haqatak

For some reason my fine tuning (following that tutorial) doesn't seem to create a vocab.json file in the checkpoint folders. Is there a step that generates that file that I could have missed? It is expected by the conversion script.

Also, I'm assuming that the checkpoint directories are the final product of training. Perhaps I am missing a finalization step somewhere?

jasontitus avatar Apr 12 '23 22:04 jasontitus