lit-llama icon indicating copy to clipboard operation
lit-llama copied to clipboard

【solved】use adpater_v2.py fine-tuning llama 13B error

Open ChaoyuHuang opened this issue 1 year ago • 4 comments

when i use alpaca data to fine-tuning llama13B on 4*A100 80GB GPU, i got the following erroes:

RuntimeError: Error(s) in loading state_dict for LLaMA:
        size mismatch for lm_head.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
        size mismatch for transformer.wte.weight: copying a param with shape torch.Size([32000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
        size mismatch for transformer.h.0.rms_1.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([15360, 5120]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
        size mismatch for transformer.h.0.attn.c_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
        size mismatch for transformer.h.0.rms_2.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for transformer.h.0.mlp.c_fc1.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
        size mismatch for transformer.h.0.mlp.c_fc2.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
        size mismatch for transformer.h.0.mlp.c_proj.weight: copying a param with shape torch.Size([5120, 13824]) from checkpoint, the shape in current model is torch.Size([4096, 11008]).
        size mismatch for transformer.h.1.rms_1.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for transformer.h.1.attn.c_attn.weight: copying a param with shape torch.Size([15360, 5120]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
        size mismatch for transformer.h.1.attn.c_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
        size mismatch for transformer.h.1.rms_2.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for transformer.h.1.mlp.c_fc1.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
        size mismatch for transformer.h.1.mlp.c_fc2.weight: copying a param with shape torch.Size([13824, 5120]) from checkpoint, the shape in current model is torch.Size([11008, 4096]).
        size mismatch for transformer.h.1.mlp.c_proj.weight: copying a param with shape torch.Size([5120, 13824]) from checkpoint, the shape in current model is torch.Size([4096, 11008]).
        size mismatch for transformer.h.2.rms_1.scale: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for transformer.h.2.attn.c_attn.weight: copying a param with shape torch.Size([15360, 5120]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
        size mismatch for transformer.h.2.attn.c_proj.weight: copying a param with shape torch.Size([5120, 5120]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).

i'm sure i download the right version of llama13B, and i use the convert_hf_checkpoint.py to convert xx.bin to xx.ph

ChaoyuHuang avatar Jun 21 '23 15:06 ChaoyuHuang

i only modify the adpter_v2.py like that devices = 1 to devices = 4

AND

batch_size = 64 / devices to batch_size = 128 / devices

ChaoyuHuang avatar Jun 21 '23 15:06 ChaoyuHuang

Same error, did you solve it? @ChaoyuHuang

MasterEndless avatar Jun 22 '23 20:06 MasterEndless

nope,i tried reinstall cuda118,but it doesn't work

---Original--- From: "Han @.> Date: Fri, Jun 23, 2023 04:11 AM To: @.>; Cc: @.@.>; Subject: Re: [Lightning-AI/lit-llama] use adpater_v2.py fine-tuning llama 13Berror (Issue #405)

Same error, did you solve it? @ChaoyuHuang

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

ChaoyuHuang avatar Jun 23 '23 06:06 ChaoyuHuang

try to change this line in adapter_v2: config = LLaMAConfig(block_size=max_seq_length) to this: config = LLaMAConfig(block_size=max_seq_length).from_name("13B")

LamOne1 avatar Jun 25 '23 08:06 LamOne1