ggml
ggml copied to clipboard
[GPT-2] Convert h5 to ggml
I adapted the GPT-J example script to convert a Portuguese fine-tuned GPT2 model in h5 format to ggml.
full conversion log:
Some weights of the model checkpoint at /Volumes/Documentos/Models/gpt2-small-portuguese were not used when initializing GPT2Model: ['lm_head.weight']
- This IS expected if you are initializing GPT2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPT2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Processing variable: wte.weight with shape: (50257, 768)
Processing variable: wpe.weight with shape: (1024, 768)
Processing variable: h.0.ln_1.weight with shape: (768,)
Processing variable: h.0.ln_1.bias with shape: (768,)
Processing variable: h.0.attn.bias with shape: (1024, 1024)
Skipping variable: h.0.attn.bias
Processing variable: h.0.attn.masked_bias with shape: ()
Skipping variable: h.0.attn.masked_bias
Processing variable: h.0.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.0.attn.c_attn.bias with shape: (2304,)
Processing variable: h.0.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.0.attn.c_proj.bias with shape: (768,)
Processing variable: h.0.ln_2.weight with shape: (768,)
Processing variable: h.0.ln_2.bias with shape: (768,)
Processing variable: h.0.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.0.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.0.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.0.mlp.c_proj.bias with shape: (768,)
Processing variable: h.1.ln_1.weight with shape: (768,)
Processing variable: h.1.ln_1.bias with shape: (768,)
Processing variable: h.1.attn.bias with shape: (1024, 1024)
Skipping variable: h.1.attn.bias
Processing variable: h.1.attn.masked_bias with shape: ()
Skipping variable: h.1.attn.masked_bias
Processing variable: h.1.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.1.attn.c_attn.bias with shape: (2304,)
Processing variable: h.1.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.1.attn.c_proj.bias with shape: (768,)
Processing variable: h.1.ln_2.weight with shape: (768,)
Processing variable: h.1.ln_2.bias with shape: (768,)
Processing variable: h.1.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.1.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.1.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.1.mlp.c_proj.bias with shape: (768,)
Processing variable: h.2.ln_1.weight with shape: (768,)
Processing variable: h.2.ln_1.bias with shape: (768,)
Processing variable: h.2.attn.bias with shape: (1024, 1024)
Skipping variable: h.2.attn.bias
Processing variable: h.2.attn.masked_bias with shape: ()
Skipping variable: h.2.attn.masked_bias
Processing variable: h.2.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.2.attn.c_attn.bias with shape: (2304,)
Processing variable: h.2.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.2.attn.c_proj.bias with shape: (768,)
Processing variable: h.2.ln_2.weight with shape: (768,)
Processing variable: h.2.ln_2.bias with shape: (768,)
Processing variable: h.2.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.2.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.2.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.2.mlp.c_proj.bias with shape: (768,)
Processing variable: h.3.ln_1.weight with shape: (768,)
Processing variable: h.3.ln_1.bias with shape: (768,)
Processing variable: h.3.attn.bias with shape: (1024, 1024)
Skipping variable: h.3.attn.bias
Processing variable: h.3.attn.masked_bias with shape: ()
Skipping variable: h.3.attn.masked_bias
Processing variable: h.3.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.3.attn.c_attn.bias with shape: (2304,)
Processing variable: h.3.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.3.attn.c_proj.bias with shape: (768,)
Processing variable: h.3.ln_2.weight with shape: (768,)
Processing variable: h.3.ln_2.bias with shape: (768,)
Processing variable: h.3.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.3.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.3.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.3.mlp.c_proj.bias with shape: (768,)
Processing variable: h.4.ln_1.weight with shape: (768,)
Processing variable: h.4.ln_1.bias with shape: (768,)
Processing variable: h.4.attn.bias with shape: (1024, 1024)
Skipping variable: h.4.attn.bias
Processing variable: h.4.attn.masked_bias with shape: ()
Skipping variable: h.4.attn.masked_bias
Processing variable: h.4.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.4.attn.c_attn.bias with shape: (2304,)
Processing variable: h.4.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.4.attn.c_proj.bias with shape: (768,)
Processing variable: h.4.ln_2.weight with shape: (768,)
Processing variable: h.4.ln_2.bias with shape: (768,)
Processing variable: h.4.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.4.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.4.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.4.mlp.c_proj.bias with shape: (768,)
Processing variable: h.5.ln_1.weight with shape: (768,)
Processing variable: h.5.ln_1.bias with shape: (768,)
Processing variable: h.5.attn.bias with shape: (1024, 1024)
Skipping variable: h.5.attn.bias
Processing variable: h.5.attn.masked_bias with shape: ()
Skipping variable: h.5.attn.masked_bias
Processing variable: h.5.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.5.attn.c_attn.bias with shape: (2304,)
Processing variable: h.5.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.5.attn.c_proj.bias with shape: (768,)
Processing variable: h.5.ln_2.weight with shape: (768,)
Processing variable: h.5.ln_2.bias with shape: (768,)
Processing variable: h.5.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.5.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.5.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.5.mlp.c_proj.bias with shape: (768,)
Processing variable: h.6.ln_1.weight with shape: (768,)
Processing variable: h.6.ln_1.bias with shape: (768,)
Processing variable: h.6.attn.bias with shape: (1024, 1024)
Skipping variable: h.6.attn.bias
Processing variable: h.6.attn.masked_bias with shape: ()
Skipping variable: h.6.attn.masked_bias
Processing variable: h.6.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.6.attn.c_attn.bias with shape: (2304,)
Processing variable: h.6.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.6.attn.c_proj.bias with shape: (768,)
Processing variable: h.6.ln_2.weight with shape: (768,)
Processing variable: h.6.ln_2.bias with shape: (768,)
Processing variable: h.6.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.6.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.6.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.6.mlp.c_proj.bias with shape: (768,)
Processing variable: h.7.ln_1.weight with shape: (768,)
Processing variable: h.7.ln_1.bias with shape: (768,)
Processing variable: h.7.attn.bias with shape: (1024, 1024)
Skipping variable: h.7.attn.bias
Processing variable: h.7.attn.masked_bias with shape: ()
Skipping variable: h.7.attn.masked_bias
Processing variable: h.7.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.7.attn.c_attn.bias with shape: (2304,)
Processing variable: h.7.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.7.attn.c_proj.bias with shape: (768,)
Processing variable: h.7.ln_2.weight with shape: (768,)
Processing variable: h.7.ln_2.bias with shape: (768,)
Processing variable: h.7.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.7.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.7.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.7.mlp.c_proj.bias with shape: (768,)
Processing variable: h.8.ln_1.weight with shape: (768,)
Processing variable: h.8.ln_1.bias with shape: (768,)
Processing variable: h.8.attn.bias with shape: (1024, 1024)
Skipping variable: h.8.attn.bias
Processing variable: h.8.attn.masked_bias with shape: ()
Skipping variable: h.8.attn.masked_bias
Processing variable: h.8.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.8.attn.c_attn.bias with shape: (2304,)
Processing variable: h.8.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.8.attn.c_proj.bias with shape: (768,)
Processing variable: h.8.ln_2.weight with shape: (768,)
Processing variable: h.8.ln_2.bias with shape: (768,)
Processing variable: h.8.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.8.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.8.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.8.mlp.c_proj.bias with shape: (768,)
Processing variable: h.9.ln_1.weight with shape: (768,)
Processing variable: h.9.ln_1.bias with shape: (768,)
Processing variable: h.9.attn.bias with shape: (1024, 1024)
Skipping variable: h.9.attn.bias
Processing variable: h.9.attn.masked_bias with shape: ()
Skipping variable: h.9.attn.masked_bias
Processing variable: h.9.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.9.attn.c_attn.bias with shape: (2304,)
Processing variable: h.9.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.9.attn.c_proj.bias with shape: (768,)
Processing variable: h.9.ln_2.weight with shape: (768,)
Processing variable: h.9.ln_2.bias with shape: (768,)
Processing variable: h.9.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.9.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.9.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.9.mlp.c_proj.bias with shape: (768,)
Processing variable: h.10.ln_1.weight with shape: (768,)
Processing variable: h.10.ln_1.bias with shape: (768,)
Processing variable: h.10.attn.bias with shape: (1024, 1024)
Skipping variable: h.10.attn.bias
Processing variable: h.10.attn.masked_bias with shape: ()
Skipping variable: h.10.attn.masked_bias
Processing variable: h.10.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.10.attn.c_attn.bias with shape: (2304,)
Processing variable: h.10.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.10.attn.c_proj.bias with shape: (768,)
Processing variable: h.10.ln_2.weight with shape: (768,)
Processing variable: h.10.ln_2.bias with shape: (768,)
Processing variable: h.10.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.10.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.10.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.10.mlp.c_proj.bias with shape: (768,)
Processing variable: h.11.ln_1.weight with shape: (768,)
Processing variable: h.11.ln_1.bias with shape: (768,)
Processing variable: h.11.attn.bias with shape: (1024, 1024)
Skipping variable: h.11.attn.bias
Processing variable: h.11.attn.masked_bias with shape: ()
Skipping variable: h.11.attn.masked_bias
Processing variable: h.11.attn.c_attn.weight with shape: (768, 2304)
Processing variable: h.11.attn.c_attn.bias with shape: (2304,)
Processing variable: h.11.attn.c_proj.weight with shape: (768, 768)
Processing variable: h.11.attn.c_proj.bias with shape: (768,)
Processing variable: h.11.ln_2.weight with shape: (768,)
Processing variable: h.11.ln_2.bias with shape: (768,)
Processing variable: h.11.mlp.c_fc.weight with shape: (768, 3072)
Processing variable: h.11.mlp.c_fc.bias with shape: (3072,)
Processing variable: h.11.mlp.c_proj.weight with shape: (3072, 768)
Processing variable: h.11.mlp.c_proj.bias with shape: (768,)
Processing variable: ln_f.weight with shape: (768,)
Processing variable: ln_f.bias with shape: (768,)
Done. Output file: /Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin```
At this time this is not done, when I try inference using converted model I get this error:
gpt2_model_load: unknown tensor 'wte.weight' in model file
From what I saw the model uses names the variables differently from the original model:
variable: wte.weight shape: (50257, 768)
variable: wpe.weight shape: (1024, 768)
variable: h.0.ln_1.weight shape: (768,)
variable: h.0.ln_1.bias shape: (768,)
variable: h.0.attn.bias shape: (1024, 1024)
variable: h.0.attn.masked_bias shape: ()
variable: h.0.attn.c_attn.weight shape: (768, 2304)
variable: h.0.attn.c_attn.bias shape: (2304,)
variable: h.0.attn.c_proj.weight shape: (768, 768)
variable: h.0.attn.c_proj.bias shape: (768,)
variable: h.0.ln_2.weight shape: (768,)
variable: h.0.ln_2.bias shape: (768,)
variable: h.0.mlp.c_fc.weight shape: (768, 3072)
variable: h.0.mlp.c_fc.bias shape: (3072,)
variable: h.0.mlp.c_proj.weight shape: (3072, 768)
variable: h.0.mlp.c_proj.bias shape: (768,)
It worked mapping the tensors correctly :D
result:
main: seed = 1678457502
gpt2_model_load: loading model from '/Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx = 1024
gpt2_model_load: n_embd = 768
gpt2_model_load: n_head = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: f16 = 0
gpt2_model_load: ggml ctx size = 546.74 MB
gpt2_model_load: memory size = 72.00 MB, n_mem = 12288
gpt2_model_load: model size = 474.70 MB
main: number of tokens in prompt = 14
O brasil é o maior país da america latina, que também possui em comum o português e o
main: mem per token = 2004636 bytes
main: load time = 242.43 ms
main: sample time = 1.70 ms
main: predict time = 153.59 ms / 6.68 ms per token
main: total time = 423.49 ms
Cool!
However, instead of changing the tensor names in the .cpp
, you have to change the names in the python script to match those already used in the .cpp
. Otherwise, you break the compatibility with the existing .ckpt
conversion
Done 👍