peft
peft copied to clipboard
loss 0 with load_in_8bit = True
Hello, I came across an issue regarding loss remaining at 0 when using PEFT for the chatglm model. I found that the model works well with fp16. For your reference, my environment consists of a V100 16GB, peft 0.3.0.dev0, and Bitsandbytes 0.37.1. Could you please take a look and provide any insights on how to resolve this issue? Thank you.
hi @chuckhope Can you share a small reproducible script? Thanks!
My script is based on the finetune.py of https://github.com/mymusise/ChatGLM-Tuning/tree/master
Encountered the same issue! Training OPT-125M with bnb on summarization.
Encountered the same issue! Training ALPACA_LORA-13B.
Encountered the same issue! Training llama-33b.
@matrixssy @bhavnicksm @chuckhope Sorry to bother you, but have you solved your problem yet?
Just wanted to leave a note here that I was encountering the same issue with 8bit training on StableLM and it went away somehow.
I'm not exactly sure what fixed it but between the last time I got loss=0 and now, I did the following:
- Rebooted the machine
- Changed the JSON training dataset I used to no longer have a "train" attribute. I suspect this is what truly fixed it.
- Removed an extra
model.cuda()
call before starting the training. - Possibly rebuilt the docker image I use for training which would update to the latest pip package updates.
Sorry this isn't much help but I figured I'd post in case it helps anyone else.
If I have some time after I finish training I'll try to undo the dataset changes to see if the issue re-emerges.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.