Results 7 comments of Minghao Yan

I disabled all occurrences of bf16. If you need to use bf16 then I am not sure.

Thanks for your reply! I think I might not be loading correctly. I disabled all Lora implementation and reverted to the default llama-3-8b setup. Currently I am trying to copy...

Thank you very much for the pointer! After some more investigation, it does seem like the first step loss is too high (without any LoRA or any training) after loading...

Thank you for your reply! I moved load_from_full_model_state_dict to after model.to_empty(...), if I keep torch device as cpu, the behavior is the same. If I change device to cuda, it...

Thank you! I have created a PR here: #427

I was not aware of this, thank you!