DPPnet icon indicating copy to clipboard operation
DPPnet copied to clipboard

An illegal memory access was encountered

Open badripatro opened this issue 7 years ago • 0 comments

Problem Statement: When I am running the following command

th vqa_train.lua -gpuid 1

I get the following message :


loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54 done creating a neural network with random initialization

/home/cse/torch/install/bin/luajit: C++ exception badri@cse-desktop:/DPPnet-master/004_train_DPPnet_fixed_cnn$


Also, I have narrowed it down to the line 79 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem

libhashnn.mysort(self['sort_key' .. WorB],self['sort_val_'.. WorB])_

Then I have commented the line -79, and complied again

th vqa_train.lua -gpuid 1

I get the following message :


loading cache..: /home1/badri/badripatro/VQA/workspace_project/image_qa_dpp/DPPnet-master/004_train_DPPnet_fixed_cnn/cache/vqa_data_cache_major_test-dev2015_54 done
creating a neural network with random initialization
initialing weights..
[train2014val2014] set batch order option 1 : shuffle __________________________________________________
THCudaCheck FAIL file=/home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=147 error=77 : an illegal memory access was encountered /home1/badri/torch/install/bin/luajit: cuda runtime error (77) : an illegal memory access was encountered at /home1/badri/torch/extra/cutorch/lib/THC/generic/THCStorage.c:147


I have narrowed this problem down to the line 423 of file

004_train_DPPnet_fixed_cnn/vqa_train.lua

  **dlinear_out[i] = HasherME:backward(dhashed_out)**

Still on more debug, find in line no 114 of file "DPPnet-master_1/model/HashedNets/HasherME.lua" and get get "libhashnn.mysort()" has problem

libhashnn.myreduce(self.sort_key_W,self.gradOBuffer,self.unique_idxW,self.gradInput,self.buffer_W)

Always getting problem in the "libhashnn". Does anyone have any advice on how I can try to further determine the problem?

badripatro avatar Sep 24 '16 19:09 badripatro