TencentPretrain
TencentPretrain copied to clipboard
pretrain.py of llama-7b model, Exception: : Current loss scale already at minimum
when I run pretrain.py of llama-7b model, it has exception below:
ExceptionException: : Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.
what's problem? How to solve?
My GPU server:
GPU:single A100(80G)
memory: 128G
i think it's good to try bf16. it will work with A100 and the problem of loss scale