Conversion to ggml format
Could you provide a script to convert a model from the Lit-LLaMA format to the original format, so that it can be used in llamacpp? The Lit-LLaMA format is not supported by llamacpp.
The /scripts/convert_hg_checkpoint.py rename some layers transformer.* and reshape others (Turn [Q1, K1, V1, Q2, K2, V2, ...] into [Q1, Q2, ..., K1, K2, .., V1, V2, ...]. This breaks the conversion with llamacpp
Another option would be a conversion to HF format (already requested in https://github.com/Lightning-AI/lit-llama/issues/150) since the ggml conversion supports it already: https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/convert.py#L594
Yes that would work as well. Why not use that format in the first place (why introduce a format specific to that repo?)
The format is defined by the nn.Module definition. Since we provide our own implementation, the keys are different.
It is fairly easy to convert the weights to the format that will work will llama.cpp.
Just do the exact opposite of what this script does: https://github.com/Lightning-AI/lit-llama/blob/main/scripts/convert_checkpoint.py
For my case, I have fine-tuned using lit-llama lora, merged the weights, converted it back, quantized with llama.cpp and it works like a charm :)
Hi @sanjarbek16, thanks for sharing your experience. I would like to do something similar to what you have done. Would you mind sharing the script you used to convert it back? Thanks in advance!
import gc
import torch
from pathlib import Path
from typing import Dict
def reverse_convert_state_dict(state_dict: Dict[str, torch.Tensor], dtype: torch.dtype = torch.float32) -> Dict[str, torch.Tensor]:
reversed_dict = {}
reversed_dict["tok_embeddings.weight"] = state_dict["transformer.wte.weight"].to(dtype)
reversed_dict["output.weight"] = state_dict["lm_head.weight"].to(dtype)
reversed_dict["norm.weight"] = state_dict["transformer.ln_f.scale"].to(dtype)
for layer_idx in sorted(set([k.split(".")[2] for k in state_dict if k.startswith("transformer.h")])):
# attention
c_attn_weight = state_dict[f"transformer.h.{layer_idx}.attn.c_attn.weight"].to(dtype)
c_attn_len = c_attn_weight.shape[0] // 3
reversed_dict[f"layers.{layer_idx}.attention.wq.weight"] = c_attn_weight[:c_attn_len]
reversed_dict[f"layers.{layer_idx}.attention.wk.weight"] = c_attn_weight[c_attn_len:2*c_attn_len]
reversed_dict[f"layers.{layer_idx}.attention.wv.weight"] = c_attn_weight[2*c_attn_len:]
reversed_dict[f"layers.{layer_idx}.attention.wo.weight"] = state_dict[
f"transformer.h.{layer_idx}.attn.c_proj.weight"
].to(dtype)
# mlp
reversed_dict[f"layers.{layer_idx}.feed_forward.w1.weight"] = state_dict[
f"transformer.h.{layer_idx}.mlp.c_fc1.weight"
].to(dtype)
reversed_dict[f"layers.{layer_idx}.feed_forward.w2.weight"] = state_dict[
f"transformer.h.{layer_idx}.mlp.c_proj.weight"
].to(dtype)
reversed_dict[f"layers.{layer_idx}.feed_forward.w3.weight"] = state_dict[
f"transformer.h.{layer_idx}.mlp.c_fc2.weight"
].to(dtype)
# rms norm
reversed_dict[f"layers.{layer_idx}.attention_norm.weight"] = state_dict[f"transformer.h.{layer_idx}.rms_1.scale"].to(dtype)
reversed_dict[f"layers.{layer_idx}.ffn_norm.weight"] = state_dict[f"transformer.h.{layer_idx}.rms_2.scale"].to(dtype)
return reversed_dict
def reverse_meta_weights_for_nano_model(
*,
input_dir: Path = Path("checkpoints/merged"),
output_dir: Path = Path("checkpoints/merged/reversed_model/"),
model_size: str = "7B",
dtype: str = "float32",
) -> None:
# input_dir = input_dir / model_size
# output_dir = output_dir / model_size
output_dir.mkdir(parents=True, exist_ok=True)
dt = getattr(torch, dtype, None)
if not isinstance(dt, torch.dtype):
raise ValueError(f"{dtype} is not a valid dtype.")
dtype = dt
# Load the converted checkpoint
converted_checkpoint = torch.load(input_dir, map_location="cpu")
# Reverse the conversion
reversed_checkpoint = reverse_convert_state_dict(converted_checkpoint, dtype=dtype)
# Save the reversed checkpoint
torch.save(reversed_checkpoint, output_dir / "consolidated.00.pth")
# del converted_checkpoint
# del reversed_checkpoint
gc.collect()
if __name__ == "__main__":
from jsonargparse import CLI
CLI(reverse_meta_weights_for_nano_model)
The above code worked for me. It does the exact opposite of convert_checkpoint. If you change dtype to float16, the resulting file will be of same size with the original llama weights.
Thank you very much @sanjarbek16, worked pretty well here as well! 👍
Going to try this on the weekend, you are a life saver @sanjarbek16 !!