whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

How to use a .safetensors file in this library?

Open kartonrad opened this issue 1 year ago • 6 comments

kartonrad avatar Aug 17 '24 14:08 kartonrad

The ecosystem seems to have moved on from .pt and the pytorch format in general... Is there anyway we can use those files to generate .ggml files?

kartonrad avatar Aug 17 '24 14:08 kartonrad

+1 I am trying to convert this https://huggingface.co/biodatlab/distill-whisper-th-medium model. By trying to read into model instance first. But I have tried a few ways, still cannot. I have attempted safetensors > pt > ggml safetensors > bin > ggml

After turning into pt or bin, there are still missing information for the conversion scripts so I cannot find any way to convert yet.

update:

I finally understand how to convert and use those models after reading the source code and exploring the folder. Safetensor model (hugging face format) need to be converted to ggml and the script is provided inside the tool.

Thanks for making the tools which benefit many people and use cases.

jingcodeguy avatar Aug 22 '24 17:08 jingcodeguy

I am using vosk in the meantime xD

Got it bundled into an tauri/rust Android App

But am interested how a distilled, dedicated german whisper model would perform

So far, its not been nearly as reliably as good old vosk and i am by no means qualified in AI - So i couldnt fix it

kartonrad avatar Aug 22 '24 23:08 kartonrad

+1 I am trying to convert this https://huggingface.co/biodatlab/distill-whisper-th-medium model. By trying to read into model instance first. But I have tried a few ways, still cannot. I have attempted safetensors > pt > ggml safetensors > bin > ggml

After turning into pt or bin, there are still missing information for the conversion scripts so I cannot find any way to convert yet.

update:

I finally understand how to convert and use those models after reading the source code and exploring the folder. Safetensor model (hugging face format) need to be converted to ggml and the script is provided inside the tool.

Thanks for making the tools which benefit many people and use cases.

hello! struggling with same issue. could you please enlighten me on exactly which script you used and the command? thank you!!

graves avatar Dec 08 '24 04:12 graves

NEVERMIND 😩 It turns out the config.json in the model I'm using has set the top level max_length (I'm guessing this is the model max_length? content window maybe?) to be null. Simply setting the max_length to 448 as seen in the model_max_length in the generation_config.json allowed me to build using the following command:

python models/convert-h5-to-ggml.py whisper-base-hungarian_v1 ./whisper .

Be careful about this. At first I attempted to use the model_max_length as found in the tokenizer config files, ie tokenizer_config.json and received the following error:

λ ./main -m whisper-base-hungarian_v1.ggml -f output.wav --output-srt --print-colors
whisper_init_from_file_with_params_no_state: loading model from 'whisper-base-hungarian_v1.ggml'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 3
whisper_init_with_params_no_state: backends   = 3
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 1024
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_default_buffer_type: using device Metal (Apple M2)
whisper_model_load:    Metal total size =   148.55 MB
whisper_model_load: tensor 'decoder.positional_embedding' has wrong size in model file
whisper_model_load: shape: [512, 448, 1], expected: [512, 1024, 1]
whisper_init_with_params_no_state: failed to load model
error: failed to initialize whisper context

graves avatar Dec 08 '24 05:12 graves

try converting safetensors file to ggml first with this script, which is adapted form h5 conversion script:

#!/usr/bin/env python3
"""
Convert Hugging Face fine-tuned models (stored as safetensors)
to ggml format.

Usage:

  # Clone the repos if you haven't already:
  git clone https://github.com/openai/whisper
  git clone https://github.com/ggerganov/whisper.cpp
  git clone https://huggingface.co/openai/whisper-medium

  # Convert the model
  python3 ./whisper.cpp/models/convert-safetensors-to-ggml.py \
          ./whisper-medium/ ./whisper/dir/ ./output_dir [use-f32]

For more info see:
  https://github.com/ggerganov/whisper.cpp/issues/157
"""

import io
import os
import sys
import struct
import json
import torch
import numpy as np
from pathlib import Path
from safetensors.torch import load_file

# The mapping used to convert parameter names from the Hugging Face format to the ggml format.
conv_map = {
    'self_attn.k_proj'              : 'attn.key',
    'self_attn.q_proj'              : 'attn.query',
    'self_attn.v_proj'              : 'attn.value',
    'self_attn.out_proj'            : 'attn.out',
    'self_attn_layer_norm'          : 'attn_ln',
    'encoder_attn.q_proj'           : 'cross_attn.query',
    'encoder_attn.v_proj'           : 'cross_attn.value',
    'encoder_attn.out_proj'         : 'cross_attn.out',
    'encoder_attn_layer_norm'       : 'cross_attn_ln',
    'fc1'                           : 'mlp.0',
    'fc2'                           : 'mlp.2',
    'final_layer_norm'              : 'mlp_ln',
    'encoder.layer_norm.bias'       : 'encoder.ln_post.bias',
    'encoder.layer_norm.weight'     : 'encoder.ln_post.weight',
    'encoder.embed_positions.weight': 'encoder.positional_embedding',
    'decoder.layer_norm.bias'       : 'decoder.ln.bias',
    'decoder.layer_norm.weight'     : 'decoder.ln.weight',
    'decoder.embed_positions.weight': 'decoder.positional_embedding',
    'decoder.embed_tokens.weight'   : 'decoder.token_embedding.weight',
    'proj_out.weight'               : 'decoder.proj.weight',
}

# Reference: https://github.com/openai/gpt-2/blob/master/src/encoder.py
def bytes_to_unicode():
    """
    Returns dictionary mapping each byte to a unicode string.
    This is used for reversible BPE codes.
    """
    bs = list(range(ord("!"), ord("~") + 1)) + \
         list(range(ord("¡"), ord("¬") + 1)) + \
         list(range(ord("®"), ord("ÿ") + 1))
    cs = bs[:]
    n = 0
    for b in range(2**8):
        if b not in bs:
            bs.append(b)
            cs.append(2**8 + n)
            n += 1
    cs = [chr(n) for n in cs]
    return dict(zip(bs, cs))

if len(sys.argv) < 4:
    print("Usage: convert-safetensors-to-ggml.py dir_model path-to-whisper-repo dir-output [use-f32]\n")
    sys.exit(1)

# Set up model, whisper repo and output directories.
dir_model   = Path(sys.argv[1])
dir_whisper = Path(sys.argv[2])
dir_out     = Path(sys.argv[3])

# Load tokenizer and model config files.
encoder       = json.load((dir_model / "vocab.json").open("r", encoding="utf8"))
hparams       = json.load((dir_model / "config.json").open("r", encoding="utf8"))

# Some models might be missing the 'max_length' parameter. Fall back to 'max_target_positions'.
if "max_length" not in hparams:
    hparams["max_length"] = hparams.get("max_target_positions", 448)

# Load model weights from the safetensors file.
pt_file = dir_model / "model.safetensors"
if not pt_file.exists():
    print("Error: model.safetensors not found in", pt_file)
    sys.exit(1)
# The safetensors library loads weight tensors in a dictionary.
list_vars = load_file(str(pt_file), device="cpu")

# Load mel filters based on the number of mel bins.
n_mels = hparams["num_mel_bins"]
with np.load(os.path.join(dir_whisper, "whisper/assets", "mel_filters.npz")) as f:
    filters = torch.from_numpy(f[f"mel_{n_mels}"])

# Assuming that the tokenizer is stored in the same folder as the model.
dir_tokenizer = dir_model

# Set output filename based on whether f16 or f32 conversion is used.
fname_out = dir_out / "ggml-model.bin"

tokens = json.load(open(dir_tokenizer / "vocab.json", "r", encoding="utf8"))

# Use 16-bit or 32-bit floats.
use_f16 = True
if len(sys.argv) > 4:
    use_f16 = False
    fname_out = dir_out / "ggml-model-f32.bin"

fout = open(fname_out, "wb")

# Write the header information.
fout.write(struct.pack("i", 0x67676d6c))  # magic: ggml in hex
fout.write(struct.pack("i", hparams["vocab_size"]))
fout.write(struct.pack("i", hparams["max_source_positions"]))
fout.write(struct.pack("i", hparams["d_model"]))
fout.write(struct.pack("i", hparams["encoder_attention_heads"]))
fout.write(struct.pack("i", hparams["encoder_layers"]))
fout.write(struct.pack("i", hparams["max_length"]))
fout.write(struct.pack("i", hparams["d_model"]))
fout.write(struct.pack("i", hparams["decoder_attention_heads"]))
fout.write(struct.pack("i", hparams["decoder_layers"]))
fout.write(struct.pack("i", hparams["num_mel_bins"]))
fout.write(struct.pack("i", use_f16))

# Write the mel filter dimensions and the filter data.
fout.write(struct.pack("i", filters.shape[0]))
fout.write(struct.pack("i", filters.shape[1]))
for i in range(filters.shape[0]):
    for j in range(filters.shape[1]):
        fout.write(struct.pack("f", filters[i][j]))

# Write the vocabulary information.
byte_encoder = bytes_to_unicode()
byte_decoder = {v: k for k, v in byte_encoder.items()}

fout.write(struct.pack("i", len(tokens)))
tokens = sorted(tokens.items(), key=lambda x: x[1])
for key in tokens:
    # Convert the token string to its byte representation.
    text = bytearray([byte_decoder[c] for c in key[0]])
    fout.write(struct.pack("i", len(text)))
    fout.write(text)

# Process and write each variable in the state dictionary.
for name in list_vars.keys():
    # Some variables are skipped (e.g. proj_out.weight).
    if name == "proj_out.weight":
        print("Skipping", name)
        continue

    src = name
    nn = name
    if name != "proj_out.weight":
        nn = nn.split(".")[1:]
    else:
        nn = nn.split(".")

    if nn[1] == "layers":
        nn[1] = "blocks"
        if ".".join(nn[3:-1]) == "encoder_attn.k_proj":
            mapped = "attn.key" if nn[0] == "encoder" else "cross_attn.key"
        else:
            mapped = conv_map[".".join(nn[3:-1])]
        name = ".".join(nn[:3] + [mapped] + nn[-1:])
    else:
        name = ".".join(nn)
        name = conv_map[name] if name in conv_map else name

    print(src, " -> ", name)
    # Convert the tensor to a NumPy array.
    data = list_vars[src].squeeze().numpy()
    data = data.astype(np.float16)

    # Reshape convolutional biases if needed.
    if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
        data = data.reshape(data.shape[0], 1)
        print("  Reshaped variable:", name, "to shape:", data.shape)

    n_dims = len(data.shape)
    print(name, n_dims, data.shape)

    # Determine whether to use float16 or fall back to float32 for small tensors.
    ftype = 1  # 1 -> float16, 0 -> float32
    if use_f16:
        if n_dims < 2 or \
           name in ["encoder.conv1.bias", "encoder.conv2.bias", 
                    "encoder.positional_embedding", "decoder.positional_embedding"]:
            print("  Converting to float32")
            data = data.astype(np.float32)
            ftype = 0
    else:
        data = data.astype(np.float32)
        ftype = 0

    # Write the header for this variable.
    str_ = name.encode("utf-8")
    fout.write(struct.pack("iii", n_dims, len(str_), ftype))
    for i in range(n_dims):
        fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
    fout.write(str_)

    # Write the tensor data.
    data.tofile(fout)

fout.close()

print("Done. Output file:", fname_out)
print("")

jo32 avatar Feb 20 '25 10:02 jo32