DialoGPT Error in large model's config

Error in large model's config

Open agi1512 opened this issue 5 years ago • 1 comments

Hello. Large model trained from scratch has wrong config, resulting in errors below: RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: Missing key(s) in state_dict: "transformer.h.36.ln_1.weight", "transformer.h.36.ln_1.bias", ... , "transformer.h.47.mlp.c_proj.weight", "transformer.h.47.mlp.c_proj.bias".

size mismatch for transformer.wte.weight: copying a param with shape torch.Size([50257, 1280]) from checkpoint, the shape in current model is torch.Size([50257, 1600]). size mismatch for transformer.wpe.weight: copying a param with shape torch.Size([1024, 1280]) from checkpoint, the shape in current model is torch.Size([1024, 1600]). size mismatch for transformer.h.0.ln_1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([1600]). size mismatch for transformer.h.0.ln_1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([1600]).

This is when loading config and then loading pkl pretrained file. Also if i change config to 1280 embeddings it works, but i get strange interactions, after some chatting model starts repeating same answer all over again. Maybe issue is in my decoding script (it's handwritten, not yours), but i keep getting mumbles

S a, u m? M.

or repeating answers such as:

My favorite color is red, red, red, red, red

i'm from r all. i'm from r all. i'm from r all.

etc

Nov 12 '19 10:11 agi1512

Hello. The embedding size is indeed 1280. Please check the configs/ folder for more details. About the repetitive generation problem, you can probably refer to another issue where there are some discussion about the 3rd party decoding script implementation details.

Nov 13 '19 13:11 dreasysnail

DialoGPT DialoGPT copied to clipboard

Error in large model's config

DialoGPT
DialoGPT copied to clipboard