unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Qwen2 error when loading from checkpoint

Open NilanEkanayake opened this issue 1 year ago • 2 comments

Works as expected when loading the base model, but when a LoRA checkpoint is loaded in place of the base model, unsloth returns: Unsloth cannot patch Attention layers with our manual autograd engine since either LoRA adapters are not enabled or a bias term (like in Qwen) is used.

Currently using the latest git version of unsloth. The model still loads, but performing inference returns gibberish.

Thanks.

NilanEkanayake avatar May 16 '24 16:05 NilanEkanayake

Also, if the trained model is merged/saved and then loaded, this is returned:

Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at qwen-0.5-ft and are newly initialized: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.v_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

..and also generates gibberish during inference

NilanEkanayake avatar May 16 '24 16:05 NilanEkanayake

Hmmm weird - ill check this sorry on the issue

danielhanchen avatar May 17 '24 18:05 danielhanchen

any update on this

findalexli avatar Aug 20 '24 16:08 findalexli

Let me re prioritize this!

danielhanchen avatar Aug 24 '24 00:08 danielhanchen

Getting the same on my LoRAs trained on gemma-2-27B

Sehyo avatar Sep 06 '24 15:09 Sehyo

@Sehyo do you know if the issue is now solved? Thanks

shimmyshimmer avatar Oct 28 '24 04:10 shimmyshimmer