lit-llama icon indicating copy to clipboard operation
lit-llama copied to clipboard

Conversion to ggml format

Open H4dr1en opened this issue 2 years ago • 8 comments

Could you provide a script to convert a model from the Lit-LLaMA format to the original format, so that it can be used in llamacpp? The Lit-LLaMA format is not supported by llamacpp.

The /scripts/convert_hg_checkpoint.py rename some layers transformer.* and reshape others (Turn [Q1, K1, V1, Q2, K2, V2, ...] into [Q1, Q2, ..., K1, K2, .., V1, V2, ...]. This breaks the conversion with llamacpp

H4dr1en avatar May 24 '23 11:05 H4dr1en

Another option would be a conversion to HF format (already requested in https://github.com/Lightning-AI/lit-llama/issues/150) since the ggml conversion supports it already: https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/convert.py#L594

carmocca avatar May 24 '23 15:05 carmocca

Yes that would work as well. Why not use that format in the first place (why introduce a format specific to that repo?)

H4dr1en avatar May 25 '23 14:05 H4dr1en

The format is defined by the nn.Module definition. Since we provide our own implementation, the keys are different.

carmocca avatar May 25 '23 15:05 carmocca

It is fairly easy to convert the weights to the format that will work will llama.cpp.

Just do the exact opposite of what this script does: https://github.com/Lightning-AI/lit-llama/blob/main/scripts/convert_checkpoint.py

For my case, I have fine-tuned using lit-llama lora, merged the weights, converted it back, quantized with llama.cpp and it works like a charm :)

sanjarbek16 avatar May 26 '23 01:05 sanjarbek16

Hi @sanjarbek16, thanks for sharing your experience. I would like to do something similar to what you have done. Would you mind sharing the script you used to convert it back? Thanks in advance!

joaopalotti avatar May 29 '23 15:05 joaopalotti

import gc
import torch
from pathlib import Path
from typing import Dict

def reverse_convert_state_dict(state_dict: Dict[str, torch.Tensor], dtype: torch.dtype = torch.float32) -> Dict[str, torch.Tensor]:
    reversed_dict = {}
    reversed_dict["tok_embeddings.weight"] = state_dict["transformer.wte.weight"].to(dtype)
    reversed_dict["output.weight"] = state_dict["lm_head.weight"].to(dtype)
    reversed_dict["norm.weight"] = state_dict["transformer.ln_f.scale"].to(dtype)

    for layer_idx in sorted(set([k.split(".")[2] for k in state_dict if k.startswith("transformer.h")])):
        # attention
        c_attn_weight = state_dict[f"transformer.h.{layer_idx}.attn.c_attn.weight"].to(dtype)
        c_attn_len = c_attn_weight.shape[0] // 3
        reversed_dict[f"layers.{layer_idx}.attention.wq.weight"] = c_attn_weight[:c_attn_len]
        reversed_dict[f"layers.{layer_idx}.attention.wk.weight"] = c_attn_weight[c_attn_len:2*c_attn_len]
        reversed_dict[f"layers.{layer_idx}.attention.wv.weight"] = c_attn_weight[2*c_attn_len:]

        reversed_dict[f"layers.{layer_idx}.attention.wo.weight"] = state_dict[
            f"transformer.h.{layer_idx}.attn.c_proj.weight"
        ].to(dtype)
        # mlp
        reversed_dict[f"layers.{layer_idx}.feed_forward.w1.weight"] = state_dict[
            f"transformer.h.{layer_idx}.mlp.c_fc1.weight"
        ].to(dtype)
        reversed_dict[f"layers.{layer_idx}.feed_forward.w2.weight"] = state_dict[
            f"transformer.h.{layer_idx}.mlp.c_proj.weight"
        ].to(dtype)
        reversed_dict[f"layers.{layer_idx}.feed_forward.w3.weight"] = state_dict[
            f"transformer.h.{layer_idx}.mlp.c_fc2.weight"
        ].to(dtype)
        # rms norm
        reversed_dict[f"layers.{layer_idx}.attention_norm.weight"] = state_dict[f"transformer.h.{layer_idx}.rms_1.scale"].to(dtype)
        reversed_dict[f"layers.{layer_idx}.ffn_norm.weight"] = state_dict[f"transformer.h.{layer_idx}.rms_2.scale"].to(dtype)
    return reversed_dict

def reverse_meta_weights_for_nano_model(
    *,
    input_dir: Path = Path("checkpoints/merged"),
    output_dir: Path = Path("checkpoints/merged/reversed_model/"),
    model_size: str = "7B",
    dtype: str = "float32",
) -> None:
    # input_dir = input_dir / model_size
    # output_dir = output_dir / model_size
    output_dir.mkdir(parents=True, exist_ok=True)

    dt = getattr(torch, dtype, None)
    if not isinstance(dt, torch.dtype):
        raise ValueError(f"{dtype} is not a valid dtype.")
    dtype = dt

    # Load the converted checkpoint
    converted_checkpoint = torch.load(input_dir, map_location="cpu")

    # Reverse the conversion
    reversed_checkpoint = reverse_convert_state_dict(converted_checkpoint, dtype=dtype)

    # Save the reversed checkpoint
    torch.save(reversed_checkpoint, output_dir / "consolidated.00.pth")

    # del converted_checkpoint
    # del reversed_checkpoint
    gc.collect()

if __name__ == "__main__":
    from jsonargparse import CLI

    CLI(reverse_meta_weights_for_nano_model)

The above code worked for me. It does the exact opposite of convert_checkpoint. If you change dtype to float16, the resulting file will be of same size with the original llama weights.

sanjarbek16 avatar May 30 '23 05:05 sanjarbek16

Thank you very much @sanjarbek16, worked pretty well here as well! 👍

joaopalotti avatar May 30 '23 21:05 joaopalotti

Going to try this on the weekend, you are a life saver @sanjarbek16 !!

ExcPoint avatar Sep 08 '23 23:09 ExcPoint