unsloth Why is lm_head in modules_to

It makes sense that "embed_tokens" should be specified in "modules_to_save" since that is not a linear layer.

But, lm_head is a linear layer - so why not allow LoRA to be applied there?

Also, why not allow "norm" to be made trainable by adding to "modules_to_save"?

May 21 '24 14:05 RonanKMcGovern

Sadly norm will need gradients for the layernorms, which are horrifying to write up in Triton

May 21 '24 20:05 danielhanchen

Thanks @danielhanchen , noted on the norms.

And why not allow LoRA to be applied to lm_head?

May 22 '24 09:05 RonanKMcGovern

@RonanKMcGovern Oh it can be done! It's not a normal thing to do, but it can be enabled - hmmm

May 24 '24 10:05 danielhanchen

makes sense. Yeah, I haven't strong evidence of it being needed, but I recall reading about making both embed tokens AND norms trainable for best performance in chat fine-tuning (when setting/changing the chat template).

On Fri, May 24, 2024 at 11:42 AM Daniel Han @.***> wrote:

@RonanKMcGovern https://github.com/RonanKMcGovern Oh it can be done! It's not a normal thing to do, but it can be enabled - hmmm

— Reply to this email directly, view it on GitHub https://github.com/unslothai/unsloth/issues/500#issuecomment-2129218997, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CWQYRPYBIAFTAV2LB3ZD4KSVAVCNFSM6AAAAABIBYVEQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRZGIYTQOJZG4 . You are receiving this because you were mentioned.Message ID: @.***>

May 27 '24 11:05 RonanKMcGovern

Oh if norms and embed_tokens and every thing is enabled, that's literally full finetuning, except the weight updates are low rank :))

The layernorm's gradients are just way too tedious to derive sadly

May 27 '24 17:05 danielhanchen

Hi @danielhanchen, I'm just in a process of moving my llm finetuning to unsloath. I'm impressed with the speed it gives but struggle to get the same results as before. Inspecting adapted_config I noticed that "lm_head" which I had in "target_modules", in unsloth is moved to "modules_to_save", why is that?

As well I noticed that with this change the model overfits to training data more quickly than before.

Jul 15 '24 06:07 TomekPro

If you turn on training the lm_head, then it might overfit, which is normal - I normally suggest just leaving it out

Jul 18 '24 07:07 danielhanchen

Hopefully it's all solved now? By the way we have new docs! https://docs.unsloth.ai/

Jan 19 '25 07:01 shimmyshimmer

unsloth
unsloth copied to clipboard

Why is lm_head in modules_to_save? Why not "norm"?

unsloth unsloth copied to clipboard

Why is lm_head in modules_to_save? Why not "norm"?

unsloth
unsloth copied to clipboard