Im2Text icon indicating copy to clipboard operation
Im2Text copied to clipboard

Cuda runtime error

Open arunpatala opened this issue 7 years ago • 6 comments

Hi, Nice repo. I am running the example for training with the given dataset. I am getting a cuda runtime error. I am attaching the log file.

log.txt

arunpatala avatar Mar 14 '17 03:03 arunpatala

Hmm I suspect there's something wrong with your cutorch. Can you try th -lcutorch -e "cutorch.test()" and see the results?

da03 avatar Mar 14 '17 04:03 da03

"Completed 76020 asserts in 180 tests with 0 failures and 0 errors" I have tried it on two machines both had the error. I was able to test the model but not train it.

arunpatala avatar Mar 14 '17 05:03 arunpatala

@arunpatala Unfortunately, I had again encountered the same "device-side assert triggered" problem on both Titan x pascal and Maxwell. I have cheched the cutorch, but didn't find any problems. Have you solved this problem ?

SuperWu090 avatar Apr 14 '17 12:04 SuperWu090

This problem may attribute to a recent update of cutorch https://github.com/torch/cutorch/issues/708. However, after adding CUDA_LAUNCH_BLOCKING=1, it fails in the same way as before.

SuperWu090 avatar Apr 19 '17 02:04 SuperWu090

Can you try that again? I figured out a bug that may lead to that problem. @SuperWu090

da03 avatar Apr 19 '17 17:04 da03

@da03 Thanks very much ! I have tested the program. This problem have been solved. However, due to the recent update of openNMT in Batch.lua (seems to be 1b7632a7799be84da0ef8e8407002484e38c0fe1), there seems to be a new problem "~/torch/install/bin/luajit: ~/torch/install/share/lua/5.1/onmt/data/Batch.lua:78: attempt to index a nil value" . This problem may be solved with the earlier version of openNMT (47431c773c2598384ea6f8c2200c25161f2eef12).

SuperWu090 avatar Apr 20 '17 01:04 SuperWu090