yoyodyne icon indicating copy to clipboard operation
yoyodyne copied to clipboard

Pointer-generator crashes when source and target are disjoint

Open Othergreengrasses opened this issue 1 year ago • 12 comments

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run results = self._run_stage() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage self._run_train() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1204, in _run_train self._run_sanity_check() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1276, in _run_sanity_check val_loop.run() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/loop.py", line 199, in run self.advance(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 137, in advance output = self._evaluation_step(**kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 234, in _evaluation_step output = self.trainer._call_strategy_hook(hook_name, *kwargs.values()) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1494, in _call_strategy_hook output = fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 390, in validation_step return self.model.validation_step(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/yoyodyne/models/base.py", line 286, in validation_step greedy_predictions = self(batch) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.10/dist-packages/yoyodyne/models/pointer_generator.py", line 412, in forward predictions = self.decode( File "/usr/local/lib/python3.10/dist-packages/yoyodyne/models/pointer_generator.py", line 345, in decode output, decoder_hiddens = self.decode_step( File "/usr/local/lib/python3.10/dist-packages/yoyodyne/models/pointer_generator.py", line 286, in decode_step gen_probs = self.generation_probability(context, hidden, embedded) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.10/dist-packages/yoyodyne/models/pointer_generator.py", line 72, in forward p_gen += self.W_emb(target_embeddings) + self.bias.expand( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/yoyodyne-train", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/yoyodyne/train.py", line 338, in main best_checkpoint = train(trainer, model, datamodule, args.train_from) File "/usr/local/lib/python3.10/dist-packages/yoyodyne/train.py", line 232, in train trainer.fit(model, datamodule, ckpt_path=train_from) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit call._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 63, in _call_and_handle_interrupt trainer._teardown() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1175, in _teardown self.strategy.teardown() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 499, in teardown self.accelerator.teardown() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/accelerators/cuda.py", line 75, in teardown _clear_cuda_memory() File "/usr/local/lib/python3.10/dist-packages/lightning_fabric/accelerators/cuda.py", line 366, in _clear_cuda_memory torch.cuda.empty_cache() File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 125, in empty_cache torch._C._cuda_emptyCache() RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Othergreengrasses avatar Feb 01 '24 00:02 Othergreengrasses