Daniel Han comments

Results 781 comments of


                                            Daniel Han

Loss and Grad Norm discrepancy during full finetuning

@thedarkzeno Oh I just added a fix for embed_tokens and lm_head :) You might have to update Unsloth :)

Loss and Grad Norm discrepancy during full finetuning

@thedarkzeno On that note - do you know if the losses align now? :)

Loss and Grad Norm discrepancy during full finetuning

@thedarkzeno I'm assuming its the layernorms - we don't actually support FFT since the layernorm's gradients are more invovled to calculate, hence the difference

unsloth with vllm in 8/4 bits

@quancore I'm not sure / unsure if vLLM allows serving in 4 or 8 bits! 16bit yes, but unsure on 4 or 8

unsloth with vllm in 8/4 bits

@patleeman Oh ye AWQ is great - I'm assuming you want to quantize it to AWQ?

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@ziemowit-s I'll check this out! Sorry on the issue!

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@ziemowit-s @its5Q Apologies on the issues again :( Still debugging stuff so sorry on that!

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

Actually can confirm - batched inference in fact is breaking - I'm working on a fix asap - sorry for the wait guys!

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@ziemowit-s @its5Q Much apologies on the delay - I temporarily fixed it by disabling Unsloth's fast inference paths - it seems like I need to dig deeper on why this...

Batch inference produces nonsense results for unsloth/mistral-7b-instruct-v0.2-bnb-4bit

@ziemowit-s @its5Q I think I finally fixed it!! On the example @ziemowit-s provided me: ``` [' The text emphasizes the benefits of humor in the healing process, including reducing stress,...