TencentPretrain icon indicating copy to clipboard operation
TencentPretrain copied to clipboard

pretrain.py of llama-7b model, Exception: : Current loss scale already at minimum

Open liukaiyueyuo opened this issue 1 year ago • 2 comments

when I run pretrain.py of llama-7b model, it has exception below:

ExceptionException: : Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.

what's problem? How to solve?

liukaiyueyuo avatar Apr 26 '23 06:04 liukaiyueyuo

My GPU server:
GPU:single A100(80G) memory: 128G

liukaiyueyuo avatar Apr 26 '23 07:04 liukaiyueyuo

i think it's good to try bf16. it will work with A100 and the problem of loss scale

Abolfazl-kr avatar Jan 13 '24 05:01 Abolfazl-kr