Daniel Han comments

Results 1145 comments of


                                            Daniel Han

trafficstars

[Bug] Orpheus_tts espanish finetune ,cannot generate valid voice

@Etherll Unsure if you know anything about this

Got an abnormally high loss when training Gemma-7B.

@vwxyzjn Thanks for tagging me :) Hi :) @smartliuhw Oh yes we noticed the same issue with `packing=True` causing high losses in our blog: https://unsloth.ai/blog/gemma-bugs ![image](https://github.com/huggingface/trl/assets/23090290/c93142a4-8d7c-4aa3-9a3a-5b86dfb0f551) > The most important...

Got an abnormally high loss when training Gemma-7B.

@kdcyberdude Oh you're not supposed to do `labels[:, :-1] = labels[:, 1:].clone()` --> Gemma and Unsloth internally already does that, so you're doing it twice now. Yes group by length...

Using llama-cpp-python to run quantized R1

Sorry on the delay - I'm assuming maybe you're looking for `llama-server`?

Bitsandbytes issue

@StrangeTcy Did you set `fp16 = True` or `bf16 = True` in the trainer args? PS if these are Kaggle install instructions - there are updated ones here: https://www.kaggle.com/danielhanchen/kaggle-llama-3-2-1b-3b-unsloth-notebook

Bitsandbytes issue

@StrangeTcy Ok that looks like a bitsandbytes issue - will investigate

Bitsandbytes issue

Thanks @matthewdouglas ! :) Sorry on the issue @StrangeTcy

Output deterioration with model.eval() (unsloth/Meta-Llama-3.1-8B)

I think I fixed it hopefully maybe? There was an inference issue recently in Unsloth

Context length for DPO on a 7b model

Sorry on the delay - sadly DPO does in fact use more memory than normal finetuning - I'm working on reducing VRAM usage which should definitely help

ENH: linalg: avoid unnecessary copying in BLAS/LAPACK wrappers

Also I forgot to add. Overwriting data on the original matrix DOES NOT make the LAPACK function slower. LAPACK will always overwrite the data, so the speed is the same.