Daniel Han

Results 1145 comments of Daniel Han
trafficstars

@Etherll Unsure if you know anything about this

@vwxyzjn Thanks for tagging me :) Hi :) @smartliuhw Oh yes we noticed the same issue with `packing=True` causing high losses in our blog: https://unsloth.ai/blog/gemma-bugs ![image](https://github.com/huggingface/trl/assets/23090290/c93142a4-8d7c-4aa3-9a3a-5b86dfb0f551) > The most important...

@kdcyberdude Oh you're not supposed to do `labels[:, :-1] = labels[:, 1:].clone()` --> Gemma and Unsloth internally already does that, so you're doing it twice now. Yes group by length...

Sorry on the delay - I'm assuming maybe you're looking for `llama-server`?

@StrangeTcy Did you set `fp16 = True` or `bf16 = True` in the trainer args? PS if these are Kaggle install instructions - there are updated ones here: https://www.kaggle.com/danielhanchen/kaggle-llama-3-2-1b-3b-unsloth-notebook

@StrangeTcy Ok that looks like a bitsandbytes issue - will investigate

Thanks @matthewdouglas ! :) Sorry on the issue @StrangeTcy

I think I fixed it hopefully maybe? There was an inference issue recently in Unsloth

Sorry on the delay - sadly DPO does in fact use more memory than normal finetuning - I'm working on reducing VRAM usage which should definitely help

Also I forgot to add. Overwriting data on the original matrix DOES NOT make the LAPACK function slower. LAPACK will always overwrite the data, so the speed is the same.