ChatGLM-Finetuning
ChatGLM-Finetuning copied to clipboard
在RTX 4090 上微调chatglm3报这个错:Current loss scale already at minimum - cannot decrease scale anymore
ss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4, reducing to 2
[2024-01-17 10:10:24,477] [INFO] [logging.py:96:log_dist] [Rank 0] step=16, skipped=16, lr=[0.0001], mom=[(0.9, 0.95)]
[2024-01-17 10:10:24,478] [INFO] [timer.py:260:stop] epoch=0/micro_step=64/global_step=16, RunningAvgSamplesPerSec=7.620608764200451, CurrSamplesPerSec=7.801349699356398, MemAllocated=13.44GB, MaxMemAllocated=14.82GB
0%| | 67/114599 [00:09<4:29:54, 7.07batch/s][2024-01-17 10:10:25,080] [INFO] [loss_scaler.py:183:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2, reducing to 1
[2024-01-17 10:10:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] step=17, skipped=17, lr=[0.0001], mom=[(0.9, 0.95)]
[2024-01-17 10:10:25,081] [INFO] [timer.py:260:stop] epoch=0/micro_step=68/global_step=17, RunningAvgSamplesPerSec=7.547169766053195, CurrSamplesPerSec=6.64997792221485, MemAllocated=13.44GB, MaxMemAllocated=14.82GB
0%| | 71/114599 [00:10<4:44:42, 6.70batch/s]
Traceback (most recent call last):
File "/root/ChatGLM-Finetuning/train.py", line 234, in