Daniel Han comments

Results 781 comments of


                                            Daniel Han

Model running slower after onnx conversion and quantization

Thanks @shaon-chowdhury for debugging and helping! Appreciate it :) Sorry sadly I'm not an expert on ONNX, so can't be of much help :(

Models not pushing to specified username (organisation)

Oh my I will check this

Phi-3 small (7B) and medium (14B)

@rwl4 Currrently we support phi-3 mini via https://colab.research.google.com/drive/1NvkBmkHfucGO3Ve9s1NKZvMNlw5p83ym?usp=sharing and https://huggingface.co/unsloth/Phi-3-mini-4k-instruct-bnb-4bit

Phi-3 small (7B) and medium (14B)

@rwl4 @JackCloudman @joshib123 We support Phi-3 Medium and Mini now! See https://github.com/unslothai/unsloth/releases/tag/May-2024 (also includes Colabs) Small is still in the works! Please update Unsloth for local machines. For Colab or...

Phi-3 small (7B) and medium (14B)

@joshib123 I don't think there's a bug - that probably means ur learning rate is too high

Phi-3 small (7B) and medium (14B)

@anakin87 No sorry - Small is a vastly different architecture :(

Why is lm_head in modules_to_save? Why not "norm"?

Sadly norm will need gradients for the layernorms, which are horrifying to write up in Triton

Why is lm_head in modules_to_save? Why not "norm"?

@RonanKMcGovern Oh it can be done! It's not a normal thing to do, but it can be enabled - hmmm

Why is lm_head in modules_to_save? Why not "norm"?

Oh if norms and embed_tokens and every thing is enabled, that's literally full finetuning, except the weight updates are low rank :)) The layernorm's gradients are just way too tedious...

Why is lm_head in modules_to_save? Why not "norm"?

If you turn on training the lm_head, then it might overfit, which is normal - I normally suggest just leaving it out