Daniel Han
Daniel Han
@Etherll Unsure if you know anything about this
@vwxyzjn Thanks for tagging me :) Hi :) @smartliuhw Oh yes we noticed the same issue with `packing=True` causing high losses in our blog: https://unsloth.ai/blog/gemma-bugs  > The most important...
@kdcyberdude Oh you're not supposed to do `labels[:, :-1] = labels[:, 1:].clone()` --> Gemma and Unsloth internally already does that, so you're doing it twice now. Yes group by length...
Sorry on the delay - I'm assuming maybe you're looking for `llama-server`?
@StrangeTcy Did you set `fp16 = True` or `bf16 = True` in the trainer args? PS if these are Kaggle install instructions - there are updated ones here: https://www.kaggle.com/danielhanchen/kaggle-llama-3-2-1b-3b-unsloth-notebook
@StrangeTcy Ok that looks like a bitsandbytes issue - will investigate
Thanks @matthewdouglas ! :) Sorry on the issue @StrangeTcy
I think I fixed it hopefully maybe? There was an inference issue recently in Unsloth
Sorry on the delay - sadly DPO does in fact use more memory than normal finetuning - I'm working on reducing VRAM usage which should definitely help
Also I forgot to add. Overwriting data on the original matrix DOES NOT make the LAPACK function slower. LAPACK will always overwrite the data, so the speed is the same.