deepnet icon indicating copy to clipboard operation
deepnet copied to clipboard

occur "cudamat.cudamat.CUDAMatException: CUBLAS error." when running multimodal_dbm example

Open Demoscai opened this issue 11 years ago • 8 comments

hi nitish srivastava @nitishsrivastava I have problems when running multimodal_dbm example, like this: Train Step: 0Traceback (most recent call last): File "/home/meitu299/deepnet/deepnet/trainer.py", line 60, in main() File "/home/meitu299/deepnet/deepnet/trainer.py", line 54, in main model.Train() File "/home/meitu299/deepnet/deepnet/neuralnet.py", line 631, in Train self.GetTrainBatch() File "/home/meitu299/deepnet/deepnet/neuralnet.py", line 524, in GetTrainBatch self.GetBatch(self.train_data_handler) File "/home/meitu299/deepnet/deepnet/dbm.py", line 264, in GetBatch super(DBM, self).GetBatch(handler=handler) File "/home/meitu299/deepnet/deepnet/neuralnet.py", line 512, in GetBatch data_list = handler.Get() File "/home/meitu299/deepnet/deepnet/datahandler.py", line 627, in Get batch = self.gpu_cache.Get(self.batchsize, get_last_piece=self.get_last_piece) File "/home/meitu299/deepnet/deepnet/datahandler.py", line 396, in Get self.LoadData() File "/home/meitu299/deepnet/deepnet/datahandler.py", line 327, in LoadData self.data[i] = cm.CUDAMatrix(mat) File "/home/meitu299/deepnet/cudamat/cudamat.py", line 195, in init raise generate_exception(err_code) cudamat.cudamat.CUDAMatException: CUBLAS error.

and the RAM is 8G and gpu memory is 3G in my computer ,CUDA6.0 I follow your INSTALL, but always happen this could tell how to resolve this ? It's a bug? thx

Demoscai avatar Jul 25 '14 11:07 Demoscai

Try reduce batch size from 128 to 100.

cbalint13 avatar Jul 25 '14 13:07 cbalint13

I have try to reduce batch size to 50, but it doesn't work

Demoscai avatar Jul 28 '14 07:07 Demoscai

Try to fix the value "gpu_memory" of your .pbtxt file to "2G" or "2.5G"

jormansa avatar Jul 31 '14 10:07 jormansa

thanks , that's OK

Demoscai avatar Aug 19 '14 09:08 Demoscai

thanks to you in advanved i have the similar problem. when i run the example of ff,i set the steps from 1000000 to 10000, the batchsize from 100 to 10,the gpu_memory from 2G to 0.1G,the main_memmory from 4G to 0.7G. but when i come to the setp 499, it still comes to the problem like this:

File "/home/tbq/Downloads/deepnet-master/deepnet/softmax_layer.py", line 65, in GetLoss perf.correct_preds = temp.sum() File "/home/tbq/Downloads/deepnet-master/cudamat/cudamat.py", line720, in sum return vdot(self,CUDAMatrix.ones.slice(0,self.shape[0]*self.shape[1])) File "/home/tbq/Downloads/deepnet-master/cudamat/cudamat.py", line1650 in vdot raise generate_exception(err_code.value) cudamat.cudamat.CUDAMatException: CUBLAS error.

and the RAM is 1G and gpu memory is 256M in my computer ,CUDA5.5

when i try the dbm and rbm ,it is also comes to the problem i want to know whether my cpu and gpu is not satisfy the demand. thx

tengshaofeng avatar Oct 15 '14 05:10 tengshaofeng

sorry,english is not mother tongue. in addition,gcc:4.6.3

tengshaofeng avatar Oct 15 '14 05:10 tengshaofeng

In my case, I decreased gpu_mem as 1G in run_all_dbn.sh though my gpu memory is 4G (NVIDIA GeForce GTX 780M 4096 MB).

jnhwkim avatar May 11 '15 09:05 jnhwkim

thank you all , ruducing the gpu_mem really helps , and the code strat to work , but at the end of trainning the first layer , the bug happens again , is the gpu_mem still too large? and what will happen if i reduce the gpu_mem

Thank you all a lot if anyone can help me

chaojiewang94 avatar Jan 16 '17 02:01 chaojiewang94