dress icon indicating copy to clipboard operation
dress copied to clipboard

An error training an Encoder-Decoder Attention Model

Open qiang2100 opened this issue 7 years ago • 2 comments

When I train an Encoder-Decoder Attention Model using "sh run_std.sh", I get the following error:

/home/qiang/torch/extra/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [56,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed. THCudaCheck FAIL file=/home/qiang/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered /home/qiang/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at /home/qiang/torch/extra/cutorch/lib/THC/generic/THCStorage.c:32 stack traceback: [C]: at 0x7fbc8f5b6050 [C]: in function '__index' layers/EMaskedClassNLLCriterion.lua:18: in function 'forward' nnets/EncDecAWE.lua:391: in function 'opfunc' /home/qiang/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'optimMethod' nnets/EncDecAWE.lua:468: in function 'trainBatch' train.lua:40: in function 'train' train.lua:162: in function 'main' train.lua:269: in function 'main' train.lua:272: in main chunk [C]: in function 'dofile' ...iang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405e90 Lock freed

Usage instructions:

To obtain and lock an id: ./gpu_lock.py --id The lock is automatically freed when the parent terminates

To get an id that won't be freed: ./gpu_lock.py --id-to-hog You must manually free these ids: ./gpu_lock.py --free

More info: http://homepages.inf.ed.ac.uk/imurray2/code/gpu_monitoring/

qiang2100 avatar Dec 30 '17 12:12 qiang2100

If you change to CPU mode and you can see more clearly the error comes from. One of bug I fixed is maybe because the author uses an older version of Torch. I fix my bug by replacing float to double.

Sanqiang avatar Apr 23 '18 00:04 Sanqiang

Hi @qiang2100 ! I am encountering the same error, did you find what is causing it?

Crista23 avatar Feb 10 '19 02:02 Crista23