mlx-examples icon indicating copy to clipboard operation
mlx-examples copied to clipboard

[Feature] Export Lora Adapters as GGML

Open rmarnold opened this issue 1 year ago • 3 comments

lama.cpp dropped support for converting lora to ggml, it would be very useful if we could use adapters with llama.cpp instead of fusing or merging the fine tuned model.

rmarnold avatar Jun 05 '24 05:06 rmarnold

Can you say more about what you are looking for?

Is it a separate GGUF file which contains the adapters. Then you can load the base model GGUF as well as the adapter GGUF in llama.cpp? Does it llama.cpp support that?

awni avatar Jun 05 '24 17:06 awni

@awni, Yes, in llama.cpp you can specify --lora ggml-adapter-model.bin. The issue is that after training, mlx_ml outputs adapter.safetensors, which llama.cpp does not recognize. I know for certain that the supported output format is ggml, but I have read there has been work to support gguf.

rmarnold avatar Jun 06 '24 00:06 rmarnold

It looks like it's a file format called GGLA (possibly a simplified version of GGML? or GGUF?):

https://github.com/ggerganov/llama.cpp/blob/21be9cab94e0b5b53cb6edeeebf8c8c799baad03/examples/export-lora/export-lora.cpp#L225

If I'm reading this correctly, the format is something like this:

HEADER

  • 4 bytes - file type identifier 0x616C6767 (algg)
  • 4 bytes (uint32) - file type version 1
  • 4 bytes (uint32) - LoRA rank
  • 4 bytes (uint32) - LoRA alpha

TENSORS (one after another)

ONE TENSOR METADATA

  • 4 bytes (uint32) - tensor number of dims n_dims
  • 4 bytes (uint32) - tensor name length namelen
  • 4 bytes (uint32) - tensor data type (enum) (i.e 0 - FP32, 1 - FP16)
  • 4 bytes (uint32) * n_dims - length for each tensor dimension
  • 4 bytes * namelen - tensor name

ONE TENSOR DATA (aligned to 32 bytes)

(I'm not entirely certain the format of the data here... if/when I figure more out, I'll add it)

yonomitt avatar Jun 17 '24 13:06 yonomitt