gemma.cpp
gemma.cpp copied to clipboard
Generate compressed weights file from finetune
How do i generate the compressed weights file (sbs) from my fine tune? Consider I want to convert the model assets to the compressed weights file: https://huggingface.co/google/gemma-2b-it/tree/main how would i do that?
Thanks!
Hi @sanjay920, really cool that you're trying a fine tune already. We're working on releasing a conversion script soon (hopefully within the next few days), but would be useful to prioritize source formats. What are you converting from?
Also if others need a converter for fine tune, feel free to chine in in here as well regarding what you'd use as a source format.
Ideally from a PeftModel so I can convert like it's possible in llamacpp: https://github.com/ggerganov/llama.cpp/blob/master/convert-lora-to-ggml.py
Or if one merges the lora adapter with the base model - so a GemmaModel
to sbs converter
Hi @sanjay920 , a quick FYI on the implementation: Compressor
in compression/compress-inl.h takes care of writing the SBS, so we've got that part covered. The missing bit is getting your model into our CompressedArray<>
, which is the part Austin was mentioning and asking about.
I would like to convert a fine-tuned keras model to sbs, using the fine-tuning script from https://ai.google.dev/gemma/docs/lora_tuning
I would like to convert a fine-tuned keras model to sbs, using the fine-tuning script from https://ai.google.dev/gemma/docs/lora_tuning
Hi @fengwang , there is a way to export the Keras weights to PyTorch through this script (maybe needs a little modification to remove xla
if you don't want to use it), and then convert the PyTorch weights to uncompressed weights of gemma.cpp through util/convert_weights.py.
But currently, this requires the dev
branch because of the issues mentioned in #103. They were fixed in #114 and merged into the dev
branch today.
I think this is now working, please feel free to reopen if you'd like to discuss or have an issue with the scripts.