Adding custom finetuned (and converted+quantized) model to Ollama

Open LrsChrSch opened this issue 1 year ago • 0 comments

Hey there!

First off: Thank you for this amazing model and all the work you put into it. So far this thing is really impressive (especially at this size).

I was messing around with it a bunch for the last couple of days because i have a project in mind. So I used Llava to make a more specific dataset for my task which I then used to fine tune the model. All of that worked pretty much flawlessly thanks to the finetuning notebook.

So now I have a .safetensors model which I can load as usual using the transformers library. So far so good. I wanted the whole project to run on a raspberry pi with 4gb of ram. I did a test before with the base model and the official Ollama integration which also worked really well and didn't run out of ram (yay!)

So the next step would be to create the ggufs and make a custom model file. The create_gguf.py worked to create the two files for the text model and the projector. But here's where I'm currently stuck: I can create a model file (basically a copy of the moondream:v2 model file that i got with ollama show moondream:v2 --modelfile but with the paths changed to use my ggufs), but when I try to run that I get the following error:

llama runner process has terminated: exit status 0xc0000409

I also noticed that my files are much larger than the ones used by ollama (~888mb for the projector and ~2772mb for the text model vs 910mb and 829mb). I'm guessing that ollama is not using the create_gguf.py file or that there is some custom code for quantization and conversion. But idk, there's a very high chance that I'm just missing something very obvious lol.

So I guess my question is: How can I convert my finetuned safetensors model to the right (small) format and add them as a custom model to Ollama?

Hope you can help and thank you so much in advance!

May 13 '24 14:05 LrsChrSch