DialoGPT icon indicating copy to clipboard operation
DialoGPT copied to clipboard

Error in large model's config

Open agi1512 opened this issue 4 years ago • 1 comments

Hello. Large model trained from scratch has wrong config, resulting in errors below: RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: Missing key(s) in state_dict: "transformer.h.36.ln_1.weight", "transformer.h.36.ln_1.bias", ... , "transformer.h.47.mlp.c_proj.weight", "transformer.h.47.mlp.c_proj.bias".

size mismatch for transformer.wte.weight: copying a param with shape torch.Size([50257, 1280]) from checkpoint, the shape in current model is torch.Size([50257, 1600]). size mismatch for transformer.wpe.weight: copying a param with shape torch.Size([1024, 1280]) from checkpoint, the shape in current model is torch.Size([1024, 1600]). size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([1600]). size mismatch for transformer.h.0.ln_1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([1600]).

This is when loading config and then loading pkl pretrained file. Also if i change config to 1280 embeddings it works, but i get strange interactions, after some chatting model starts repeating same answer all over again. Maybe issue is in my decoding script (it's handwritten, not yours), but i keep getting mumbles

S a, u m? M.

or repeating answers such as:

My favorite color is red, red, red, red, red

or

i'm from r all. i'm from r all. i'm from r all.

etc

agi1512 avatar Nov 12 '19 10:11 agi1512

Hello. The embedding size is indeed 1280. Please check the configs/ folder for more details. About the repetitive generation problem, you can probably refer to another issue where there are some discussion about the 3rd party decoding script implementation details.

dreasysnail avatar Nov 13 '19 13:11 dreasysnail