llama.cpp [question] how do I save a loaded model?

using llama.cpp library I do: struct llama_model* model = llama_load_model_from_file(input_model_path, params);

How do I save it back to disk in gguf format?

I'm asking because I wrote a program to modify model weights.. so I load a GGUF then modify model weights and then I need to save it back.

Jul 19 '24 21:07 0wwafa

How do I save it back to disk in gguf format?

This is currently not implemented

Jul 20 '24 14:07 ggerganov

@ggerganov that would be very useful.

Jul 20 '24 17:07 0wwafa

llama_model interface does not allowing modifying tensors. It's a read-only representation of the loaded model.

If you want to modify tensors, either using gguf_* functions provided by ggml, or to use gguf-py to modify them in python (note: python does not support reading Q-type quants)

You can read the examples/gguf to see how it works

Jul 21 '24 11:07 ngxson

llama_model interface does not allowing modifying tensors. It's a read-only representation of the loaded model.

If you want to modify tensors, either using gguf_* functions provided by ggml, or to use gguf-py to modify them in python (note: python does not support reading Q-type quants)

You can read the examples/gguf to see how it works

Nevermind. I modified the quantize program and now I can modify tensors of any model at any quantization level. Too bad llama.cpp does not support this.

Jul 22 '24 10:07 0wwafa

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sep 05 '24 01:09 github-actions[bot]