BertSum icon indicating copy to clipboard operation
BertSum copied to clipboard

Question for models/trainer.py#L325 ?

Open zjreno opened this issue 4 years ago • 3 comments

In https://github.com/nlpyang/BertSum/blob/master/src/models/trainer.py#L325 , After sum(), the loss.numel() must be 1 , What different between (loss/loss.numel()).backward() with loss.backward() ?

So, I guess, the loss.numel() may express the n_docs ? Can we use loss / normalization replace (loss/loss.numel()) ?

zjreno avatar May 19 '21 09:05 zjreno

Hi I have the same problem, what's your conclusion?

Anothernewcomer avatar Dec 14 '21 02:12 Anothernewcomer

Hi,I have a bug about this statement: Traceback (most recent call last): File "train.py", line 340, in train(args, device_id) File "train.py", line 272, in train trainer.train(train_iter_fct, args.train_steps) File "/root/code/BertSum/src/models/trainer.py", line 155, in train self._gradient_accumulation( File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation loss.div(float(normalization)).backward() File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Does it have any relation with the statement?Or have you solv it? Pardon me for my poor English!

haidequanbu avatar Aug 10 '22 07:08 haidequanbu

Hi,I have a bug about this statement: Traceback (most recent call last): File "train.py", line 340, in train(args, device_id) File "train.py", line 272, in train trainer.train(train_iter_fct, args.train_steps) File "/root/code/BertSum/src/models/trainer.py", line 155, in train self._gradient_accumulation( File "/root/code/BertSum/src/models/trainer.py", line 326, in _gradient_accumulation loss.div(float(normalization)).backward() File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Does it have any relation with the statement?Or have you solv it? Pardon me for my poor English!

Ok,i have already solved the problem.It is about using BCEcross before,you should give a sigmoid layer before the output.

haidequanbu avatar Aug 11 '22 06:08 haidequanbu