lit-llama
lit-llama copied to clipboard
【solved】use adpater_v2.py fine-tuning llama 13B error
when i use alpaca data to fine-tuning llama13B on 4*A100 80GB GPU, i got the following erroes:
RuntimeError: Error(s) in loading state_dict for LLaMA:
size mismatch for lm_head.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for transformer.wte.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for transformer.h.0.rms_1.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([15360, 5120]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.h.0.attn.c_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.h.0.rms_2.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for transformer.h.0.mlp.c_fc1.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
size mismatch for transformer.h.0.mlp.c_fc2.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([5120, 13824]) from checkpoint, the shape in current model is torch.Size([4096, 11008]).
size mismatch for transformer.h.1.rms_1.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([15360, 5120]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.h.1.attn.c_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.h.1.rms_2.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for transformer.h.1.mlp.c_fc1.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
size mismatch for transformer.h.1.mlp.c_fc2.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([5120, 13824]) from checkpoint, the shape in current model is torch.Size([4096, 11008]).
size mismatch for transformer.h.2.rms_1.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for transformer.h.2.attn.c_attn.weight: copying a param with shape torch.Size([15360, 5120]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.h.2.attn.c_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
i'm sure i download the right version of llama13B, and i use the convert_hf_checkpoint.py to convert xx.bin to xx.ph
i only modify the adpter_v2.py like that devices = 1 to devices = 4
AND
batch_size = 64 / devices to batch_size = 128 / devices
Same error, did you solve it? @ChaoyuHuang
nope,i tried reinstall cuda118,but it doesn't work
---Original--- From: "Han @.> Date: Fri, Jun 23, 2023 04:11 AM To: @.>; Cc: @.@.>; Subject: Re: [Lightning-AI/lit-llama] use adpater_v2.py fine-tuning llama 13Berror (Issue #405)
Same error, did you solve it? @ChaoyuHuang
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
try to change this line in adapter_v2:
config = LLaMAConfig(block_size=max_seq_length)
to this:
config = LLaMAConfig(block_size=max_seq_length).from_name("13B")