DialoGPT
DialoGPT copied to clipboard
Error in large model's config
Hello. Large model trained from scratch has wrong config, resulting in errors below:
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: Missing key(s) in state_dict: "transformer.h.36.ln_1.weight", "transformer.h.36.ln_1.bias", ... , "transformer.h.47.mlp.c_proj.weight", "transformer.h.47.mlp.c_proj.bias".
size mismatch for transformer.wte.weight: copying a param with shape torch.Size([50257, 1280]) from checkpoint, the shape in current model is torch.Size([50257, 1600]).
size mismatch for transformer.wpe.weight: copying a param with shape torch.Size([1024, 1280]) from checkpoint, the shape in current model is torch.Size([1024, 1600]).
size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([1600]).
size mismatch for transformer.h.0.ln_1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([1600]).
This is when loading config and then loading pkl pretrained file. Also if i change config to 1280 embeddings it works, but i get strange interactions, after some chatting model starts repeating same answer all over again. Maybe issue is in my decoding script (it's handwritten, not yours), but i keep getting mumbles
S a, u m? M.
or repeating answers such as:
My favorite color is red, red, red, red, red
or
i'm from r all. i'm from r all. i'm from r all.
etc
Hello. The embedding size is indeed 1280. Please check the configs/
folder for more details. About the repetitive generation problem, you can probably refer to another issue where there are some discussion about the 3rd party decoding script implementation details.