Daniel Han

Results 781 comments of Daniel Han

@thedarkzeno Oh I just added a fix for embed_tokens and lm_head :) You might have to update Unsloth :)

@thedarkzeno On that note - do you know if the losses align now? :)

@thedarkzeno I'm assuming its the layernorms - we don't actually support FFT since the layernorm's gradients are more invovled to calculate, hence the difference

@quancore I'm not sure / unsure if vLLM allows serving in 4 or 8 bits! 16bit yes, but unsure on 4 or 8

@patleeman Oh ye AWQ is great - I'm assuming you want to quantize it to AWQ?

@ziemowit-s @its5Q Apologies on the issues again :( Still debugging stuff so sorry on that!

Actually can confirm - batched inference in fact is breaking - I'm working on a fix asap - sorry for the wait guys!

@ziemowit-s @its5Q Much apologies on the delay - I temporarily fixed it by disabling Unsloth's fast inference paths - it seems like I need to dig deeper on why this...

@ziemowit-s @its5Q I think I finally fixed it!! On the example @ziemowit-s provided me: ``` [' The text emphasizes the benefits of humor in the healing process, including reducing stress,...