[Feature] Export Lora Adapters as GGML
lama.cpp dropped support for converting lora to ggml, it would be very useful if we could use adapters with llama.cpp instead of fusing or merging the fine tuned model.
Can you say more about what you are looking for?
Is it a separate GGUF file which contains the adapters. Then you can load the base model GGUF as well as the adapter GGUF in llama.cpp? Does it llama.cpp support that?
@awni, Yes, in llama.cpp you can specify --lora ggml-adapter-model.bin. The issue is that after training, mlx_ml outputs adapter.safetensors, which llama.cpp does not recognize. I know for certain that the supported output format is ggml, but I have read there has been work to support gguf.
It looks like it's a file format called GGLA (possibly a simplified version of GGML? or GGUF?):
https://github.com/ggerganov/llama.cpp/blob/21be9cab94e0b5b53cb6edeeebf8c8c799baad03/examples/export-lora/export-lora.cpp#L225
If I'm reading this correctly, the format is something like this:
HEADER
- 4 bytes - file type identifier
0x616C6767(algg) - 4 bytes (uint32) - file type version
1 - 4 bytes (uint32) - LoRA rank
- 4 bytes (uint32) - LoRA alpha
TENSORS (one after another)
ONE TENSOR METADATA
- 4 bytes (uint32) - tensor number of dims
n_dims - 4 bytes (uint32) - tensor name length
namelen - 4 bytes (uint32) - tensor data type (enum) (i.e 0 - FP32, 1 - FP16)
- 4 bytes (uint32) *
n_dims- length for each tensor dimension - 4 bytes *
namelen- tensor name
ONE TENSOR DATA (aligned to 32 bytes)
(I'm not entirely certain the format of the data here... if/when I figure more out, I'll add it)