peft icon indicating copy to clipboard operation
peft copied to clipboard

loss 0 with load_in_8bit = True

Open chuckhope opened this issue 1 year ago • 7 comments

Hello, I came across an issue regarding loss remaining at 0 when using PEFT for the chatglm model. I found that the model works well with fp16. For your reference, my environment consists of a V100 16GB, peft 0.3.0.dev0, and Bitsandbytes 0.37.1. Could you please take a look and provide any insights on how to resolve this issue? Thank you.

chuckhope avatar Mar 30 '23 10:03 chuckhope

hi @chuckhope Can you share a small reproducible script? Thanks!

younesbelkada avatar Mar 30 '23 10:03 younesbelkada

My script is based on the finetune.py of https://github.com/mymusise/ChatGLM-Tuning/tree/master

chuckhope avatar Mar 30 '23 11:03 chuckhope

Encountered the same issue! Training OPT-125M with bnb on summarization.

bhavnicksm avatar Mar 31 '23 07:03 bhavnicksm

Encountered the same issue! Training ALPACA_LORA-13B.

matrixssy avatar Apr 11 '23 08:04 matrixssy

Encountered the same issue! Training llama-33b.

iMountTai avatar Apr 21 '23 00:04 iMountTai

@matrixssy @bhavnicksm @chuckhope Sorry to bother you, but have you solved your problem yet?

iMountTai avatar Apr 22 '23 07:04 iMountTai

Just wanted to leave a note here that I was encountering the same issue with 8bit training on StableLM and it went away somehow.

I'm not exactly sure what fixed it but between the last time I got loss=0 and now, I did the following:

  1. Rebooted the machine
  2. Changed the JSON training dataset I used to no longer have a "train" attribute. I suspect this is what truly fixed it.
  3. Removed an extra model.cuda() call before starting the training.
  4. Possibly rebuilt the docker image I use for training which would update to the latest pip package updates.

Sorry this isn't much help but I figured I'd post in case it helps anyone else.

If I have some time after I finish training I'll try to undo the dataset changes to see if the issue re-emerges.

daramos avatar Apr 25 '23 16:04 daramos

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar May 20 '23 15:05 github-actions[bot]