ggml icon indicating copy to clipboard operation
ggml copied to clipboard

[GPT-2] Convert h5 to ggml

Open ocordeiro opened this issue 1 year ago • 3 comments

I adapted the GPT-J example script to convert a Portuguese fine-tuned GPT2 model in h5 format to ggml.

full conversion log:

Some weights of the model checkpoint at /Volumes/Documentos/Models/gpt2-small-portuguese were not used when initializing GPT2Model: ['lm_head.weight']

  • This IS expected if you are initializing GPT2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing GPT2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Processing variable: wte.weight with shape: (50257, 768) Processing variable: wpe.weight with shape: (1024, 768) Processing variable: h.0.ln_1.weight with shape: (768,) Processing variable: h.0.ln_1.bias with shape: (768,) Processing variable: h.0.attn.bias with shape: (1024, 1024) Skipping variable: h.0.attn.bias Processing variable: h.0.attn.masked_bias with shape: () Skipping variable: h.0.attn.masked_bias Processing variable: h.0.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.0.attn.c_attn.bias with shape: (2304,) Processing variable: h.0.attn.c_proj.weight with shape: (768, 768) Processing variable: h.0.attn.c_proj.bias with shape: (768,) Processing variable: h.0.ln_2.weight with shape: (768,) Processing variable: h.0.ln_2.bias with shape: (768,) Processing variable: h.0.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.0.mlp.c_fc.bias with shape: (3072,) Processing variable: h.0.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.0.mlp.c_proj.bias with shape: (768,) Processing variable: h.1.ln_1.weight with shape: (768,) Processing variable: h.1.ln_1.bias with shape: (768,) Processing variable: h.1.attn.bias with shape: (1024, 1024) Skipping variable: h.1.attn.bias Processing variable: h.1.attn.masked_bias with shape: () Skipping variable: h.1.attn.masked_bias Processing variable: h.1.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.1.attn.c_attn.bias with shape: (2304,) Processing variable: h.1.attn.c_proj.weight with shape: (768, 768) Processing variable: h.1.attn.c_proj.bias with shape: (768,) Processing variable: h.1.ln_2.weight with shape: (768,) Processing variable: h.1.ln_2.bias with shape: (768,) Processing variable: h.1.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.1.mlp.c_fc.bias with shape: (3072,) Processing variable: h.1.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.1.mlp.c_proj.bias with shape: (768,) Processing variable: h.2.ln_1.weight with shape: (768,) Processing variable: h.2.ln_1.bias with shape: (768,) Processing variable: h.2.attn.bias with shape: (1024, 1024) Skipping variable: h.2.attn.bias Processing variable: h.2.attn.masked_bias with shape: () Skipping variable: h.2.attn.masked_bias Processing variable: h.2.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.2.attn.c_attn.bias with shape: (2304,) Processing variable: h.2.attn.c_proj.weight with shape: (768, 768) Processing variable: h.2.attn.c_proj.bias with shape: (768,) Processing variable: h.2.ln_2.weight with shape: (768,) Processing variable: h.2.ln_2.bias with shape: (768,) Processing variable: h.2.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.2.mlp.c_fc.bias with shape: (3072,) Processing variable: h.2.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.2.mlp.c_proj.bias with shape: (768,) Processing variable: h.3.ln_1.weight with shape: (768,) Processing variable: h.3.ln_1.bias with shape: (768,) Processing variable: h.3.attn.bias with shape: (1024, 1024) Skipping variable: h.3.attn.bias Processing variable: h.3.attn.masked_bias with shape: () Skipping variable: h.3.attn.masked_bias Processing variable: h.3.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.3.attn.c_attn.bias with shape: (2304,) Processing variable: h.3.attn.c_proj.weight with shape: (768, 768) Processing variable: h.3.attn.c_proj.bias with shape: (768,) Processing variable: h.3.ln_2.weight with shape: (768,) Processing variable: h.3.ln_2.bias with shape: (768,) Processing variable: h.3.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.3.mlp.c_fc.bias with shape: (3072,) Processing variable: h.3.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.3.mlp.c_proj.bias with shape: (768,) Processing variable: h.4.ln_1.weight with shape: (768,) Processing variable: h.4.ln_1.bias with shape: (768,) Processing variable: h.4.attn.bias with shape: (1024, 1024) Skipping variable: h.4.attn.bias Processing variable: h.4.attn.masked_bias with shape: () Skipping variable: h.4.attn.masked_bias Processing variable: h.4.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.4.attn.c_attn.bias with shape: (2304,) Processing variable: h.4.attn.c_proj.weight with shape: (768, 768) Processing variable: h.4.attn.c_proj.bias with shape: (768,) Processing variable: h.4.ln_2.weight with shape: (768,) Processing variable: h.4.ln_2.bias with shape: (768,) Processing variable: h.4.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.4.mlp.c_fc.bias with shape: (3072,) Processing variable: h.4.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.4.mlp.c_proj.bias with shape: (768,) Processing variable: h.5.ln_1.weight with shape: (768,) Processing variable: h.5.ln_1.bias with shape: (768,) Processing variable: h.5.attn.bias with shape: (1024, 1024) Skipping variable: h.5.attn.bias Processing variable: h.5.attn.masked_bias with shape: () Skipping variable: h.5.attn.masked_bias Processing variable: h.5.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.5.attn.c_attn.bias with shape: (2304,) Processing variable: h.5.attn.c_proj.weight with shape: (768, 768) Processing variable: h.5.attn.c_proj.bias with shape: (768,) Processing variable: h.5.ln_2.weight with shape: (768,) Processing variable: h.5.ln_2.bias with shape: (768,) Processing variable: h.5.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.5.mlp.c_fc.bias with shape: (3072,) Processing variable: h.5.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.5.mlp.c_proj.bias with shape: (768,) Processing variable: h.6.ln_1.weight with shape: (768,) Processing variable: h.6.ln_1.bias with shape: (768,) Processing variable: h.6.attn.bias with shape: (1024, 1024) Skipping variable: h.6.attn.bias Processing variable: h.6.attn.masked_bias with shape: () Skipping variable: h.6.attn.masked_bias Processing variable: h.6.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.6.attn.c_attn.bias with shape: (2304,) Processing variable: h.6.attn.c_proj.weight with shape: (768, 768) Processing variable: h.6.attn.c_proj.bias with shape: (768,) Processing variable: h.6.ln_2.weight with shape: (768,) Processing variable: h.6.ln_2.bias with shape: (768,) Processing variable: h.6.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.6.mlp.c_fc.bias with shape: (3072,) Processing variable: h.6.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.6.mlp.c_proj.bias with shape: (768,) Processing variable: h.7.ln_1.weight with shape: (768,) Processing variable: h.7.ln_1.bias with shape: (768,) Processing variable: h.7.attn.bias with shape: (1024, 1024) Skipping variable: h.7.attn.bias Processing variable: h.7.attn.masked_bias with shape: () Skipping variable: h.7.attn.masked_bias Processing variable: h.7.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.7.attn.c_attn.bias with shape: (2304,) Processing variable: h.7.attn.c_proj.weight with shape: (768, 768) Processing variable: h.7.attn.c_proj.bias with shape: (768,) Processing variable: h.7.ln_2.weight with shape: (768,) Processing variable: h.7.ln_2.bias with shape: (768,) Processing variable: h.7.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.7.mlp.c_fc.bias with shape: (3072,) Processing variable: h.7.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.7.mlp.c_proj.bias with shape: (768,) Processing variable: h.8.ln_1.weight with shape: (768,) Processing variable: h.8.ln_1.bias with shape: (768,) Processing variable: h.8.attn.bias with shape: (1024, 1024) Skipping variable: h.8.attn.bias Processing variable: h.8.attn.masked_bias with shape: () Skipping variable: h.8.attn.masked_bias Processing variable: h.8.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.8.attn.c_attn.bias with shape: (2304,) Processing variable: h.8.attn.c_proj.weight with shape: (768, 768) Processing variable: h.8.attn.c_proj.bias with shape: (768,) Processing variable: h.8.ln_2.weight with shape: (768,) Processing variable: h.8.ln_2.bias with shape: (768,) Processing variable: h.8.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.8.mlp.c_fc.bias with shape: (3072,) Processing variable: h.8.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.8.mlp.c_proj.bias with shape: (768,) Processing variable: h.9.ln_1.weight with shape: (768,) Processing variable: h.9.ln_1.bias with shape: (768,) Processing variable: h.9.attn.bias with shape: (1024, 1024) Skipping variable: h.9.attn.bias Processing variable: h.9.attn.masked_bias with shape: () Skipping variable: h.9.attn.masked_bias Processing variable: h.9.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.9.attn.c_attn.bias with shape: (2304,) Processing variable: h.9.attn.c_proj.weight with shape: (768, 768) Processing variable: h.9.attn.c_proj.bias with shape: (768,) Processing variable: h.9.ln_2.weight with shape: (768,) Processing variable: h.9.ln_2.bias with shape: (768,) Processing variable: h.9.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.9.mlp.c_fc.bias with shape: (3072,) Processing variable: h.9.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.9.mlp.c_proj.bias with shape: (768,) Processing variable: h.10.ln_1.weight with shape: (768,) Processing variable: h.10.ln_1.bias with shape: (768,) Processing variable: h.10.attn.bias with shape: (1024, 1024) Skipping variable: h.10.attn.bias Processing variable: h.10.attn.masked_bias with shape: () Skipping variable: h.10.attn.masked_bias Processing variable: h.10.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.10.attn.c_attn.bias with shape: (2304,) Processing variable: h.10.attn.c_proj.weight with shape: (768, 768) Processing variable: h.10.attn.c_proj.bias with shape: (768,) Processing variable: h.10.ln_2.weight with shape: (768,) Processing variable: h.10.ln_2.bias with shape: (768,) Processing variable: h.10.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.10.mlp.c_fc.bias with shape: (3072,) Processing variable: h.10.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.10.mlp.c_proj.bias with shape: (768,) Processing variable: h.11.ln_1.weight with shape: (768,) Processing variable: h.11.ln_1.bias with shape: (768,) Processing variable: h.11.attn.bias with shape: (1024, 1024) Skipping variable: h.11.attn.bias Processing variable: h.11.attn.masked_bias with shape: () Skipping variable: h.11.attn.masked_bias Processing variable: h.11.attn.c_attn.weight with shape: (768, 2304) Processing variable: h.11.attn.c_attn.bias with shape: (2304,) Processing variable: h.11.attn.c_proj.weight with shape: (768, 768) Processing variable: h.11.attn.c_proj.bias with shape: (768,) Processing variable: h.11.ln_2.weight with shape: (768,) Processing variable: h.11.ln_2.bias with shape: (768,) Processing variable: h.11.mlp.c_fc.weight with shape: (768, 3072) Processing variable: h.11.mlp.c_fc.bias with shape: (3072,) Processing variable: h.11.mlp.c_proj.weight with shape: (3072, 768) Processing variable: h.11.mlp.c_proj.bias with shape: (768,) Processing variable: ln_f.weight with shape: (768,) Processing variable: ln_f.bias with shape: (768,) Done. Output file: /Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin```

At this time this is not done, when I try inference using converted model I get this error:

gpt2_model_load: unknown tensor 'wte.weight' in model file

ocordeiro avatar Mar 10 '23 12:03 ocordeiro

From what I saw the model uses names the variables differently from the original model:

variable: wte.weight shape:  (50257, 768)
variable: wpe.weight shape:  (1024, 768)
variable: h.0.ln_1.weight  shape:  (768,)
variable: h.0.ln_1.bias  shape:  (768,)
variable: h.0.attn.bias  shape:  (1024, 1024)
variable: h.0.attn.masked_bias  shape:  ()
variable: h.0.attn.c_attn.weight  shape:  (768, 2304)
variable: h.0.attn.c_attn.bias  shape:  (2304,)
variable: h.0.attn.c_proj.weight  shape:  (768, 768)
variable: h.0.attn.c_proj.bias  shape:  (768,)
variable: h.0.ln_2.weight shape:  (768,)
variable: h.0.ln_2.bias shape:  (768,)
variable: h.0.mlp.c_fc.weight shape:  (768, 3072)
variable: h.0.mlp.c_fc.bias  shape:  (3072,)
variable: h.0.mlp.c_proj.weight shape:  (3072, 768)
variable: h.0.mlp.c_proj.bias  shape:  (768,)

ocordeiro avatar Mar 10 '23 13:03 ocordeiro

It worked mapping the tensors correctly :D

result:

main: seed = 1678457502
gpt2_model_load: loading model from '/Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 1024
gpt2_model_load: n_embd  = 768
gpt2_model_load: n_head  = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: f16     = 0
gpt2_model_load: ggml ctx size = 546.74 MB
gpt2_model_load: memory size =    72.00 MB, n_mem = 12288
gpt2_model_load: model size  =   474.70 MB
main: number of tokens in prompt = 14

O brasil é o maior país da america latina, que também possui em comum o português e o

main: mem per token =  2004636 bytes
main:     load time =   242.43 ms
main:   sample time =     1.70 ms
main:  predict time =   153.59 ms / 6.68 ms per token
main:    total time =   423.49 ms

ocordeiro avatar Mar 10 '23 14:03 ocordeiro

Cool! However, instead of changing the tensor names in the .cpp, you have to change the names in the python script to match those already used in the .cpp. Otherwise, you break the compatibility with the existing .ckpt conversion

ggerganov avatar Mar 17 '23 05:03 ggerganov

Done 👍

ocordeiro avatar Mar 28 '23 13:03 ocordeiro