CDial-GPT icon indicating copy to clipboard operation
CDial-GPT copied to clipboard

nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.

Open Alexia1994 opened this issue 2 years ago • 4 comments

ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: unsupported operand type(s) for /: 'str' and 'int'. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed. ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: unsupported operand type(s) for /: 'str' and 'int'. Traceback (most recent call last): File "train.py", line 237, in train() File "train.py", line 225, in train trainer.run(train_loader, max_epochs=args.n_epochs) File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 850, in run return self._internal_run() File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 952, in _internal_run self._handle_exception(e) File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 716, in _handle_exception raise e File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 937, in _internal_run hours, mins, secs = self._run_once_on_dataset() File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 705, in _run_once_on_dataset self._handle_exception(e) File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 716, in _handle_exception raise e File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 688, in _run_once_on_dataset self.state.output = self._process_function(self, self.state.batch) File "train.py", line 130, in update loss = lm_loss / args.gradient_accumulation_steps TypeError: unsupported operand type(s) for /: 'str' and 'int'

Alexia1994 avatar Nov 16 '21 06:11 Alexia1994

加载thu-coai/CDial-GPT_LCCC-large预训练模型后,想用toy_train.txt finetune一下,得到如上报错。请问该如何处理?

Alexia1994 avatar Nov 16 '21 06:11 Alexia1994

Hi,

请问你找到解决方法了吗?

Thanks.

ttppss avatar Jan 22 '22 20:01 ttppss

请问您用的命令是什么?

silverriver avatar Apr 26 '22 04:04 silverriver

分类数num_class看写对了没?!

haiduo avatar Oct 13 '22 17:10 haiduo