DFGN-pytorch 关于gradient_accumulate

关于gradient_accumulate_step

Open Tswings opened this issue 5 years ago • 1 comments

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 21430 C python 11665MiB | | 1 21430 C python 7813MiB | +-----------------------------------------------------------------------------+

报错信息： Avg-LOSS0/batch/step: 6.6137880611419675 Avg-LOSS1/batch/step: 3.8737828445434572 Avg-LOSS2/batch/step: 0.00037345796823501587 Avg-LOSS3/batch/step: 1.449139289855957 Avg-LOSS4/batch/step: 1.2904924607276917 100%|█████████████████████████████████████████| 962/962 [19:47<00:00, 1.06it/s] 1%|▎ | 2/232 [00:02<04:24, 1.15s/it] Exception in thread Thread-5: Traceback (most recent call last): File "/nesi/nobackup/uoa02874/anaconda3/lib/python3.7/threading.py", line 926, in bootstrap_inner self.run() File "train.py", line 59, in run join(args.prediction_path, 'pred_epoch{}.json'.format(epc))) File "train.py", line 122, in predict start, end, sp, Type, softmask, ent, yp1, yp2 = model(batch, return_yp=True) File "/home/zden658/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/scale_wlg_persistent/filesets/project/uoa02874/PycharmProjects/DFGN-pytorch-master/DFGN/model/GFN.py", line 59, in forward input_state, entity_state, softmask = self.basicblocks[l](input_state, query_vec, batch) File "/home/zden658/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/scale_wlg_persistent/filesets/project/uoa02874/PycharmProjects/DFGN-pytorch-master/DFGN/model/layers.py", line 245, in forward entity_state = self.tok2ent(doc_state, entity_mapping, entity_length) File "/home/zden658/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/scale_wlg_persistent/filesets/project/uoa02874/PycharmProjects/DFGN-pytorch-master/DFGN/model/layers.py", line 46, in forward entity_states = entity_mapping.unsqueeze(3) * doc_state.unsqueeze(1) # N x E x L x d RuntimeError: CUDA out of memory. Tried to allocate 1.17 GiB (GPU 0; 11.91 GiB total capacity; 5.55 GiB already allocated; 523.38 MiB free; 5.15 GiB cached)

Dec 15 '19 21:12 Tswings

我进入config.py并将批处理大小设置为允许我继续的值。似乎gradient_accumulate_step = 5不会将批处理大小减小近乎足够。

Jun 03 '20 19:06 ByrdOfAFeather

DFGN-pytorch DFGN-pytorch copied to clipboard

关于gradient_accumulate_step

DFGN-pytorch
DFGN-pytorch copied to clipboard