SurfaceNetworks icon indicating copy to clipboard operation
SurfaceNetworks copied to clipboard

cupy.cuda.driver.CUDADriverError: CUDA_ERROR_NOT_INITIALIZED: initialization error

Open finerc opened this issue 5 years ago • 3 comments

Thank you for the great code! I have a problem. When i run the program on gpu, the output is as follows:

Load data Preprocess Dataset 100% (60000 of 60000) |####################| Elapsed Time: 0:00:20 Time: 0:00:20 100% (10000 of 10000) |####################| Elapsed Time: 0:00:03 Time: 0:00:03 Num parameters 90314 N/A% (0 of 937) | | Elapsed Time: 0:00:00 ETA: --:--:--Traceback (most recent call last): File "main.py", line 213, in <module> main() File "main.py", line 155, in main outputs = model(inputs, laplacian, mask) File "/home/jiang/work/ping/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/jiang/work/ping/SurfaceNetworks/src/mesh_mnist/models.py", line 43, in forward x = self._modules['rn{}'.format(i)](L, mask, x) File "/home/jiang/work/ping/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/jiang/work/ping/SurfaceNetworks/src/utils/utils_pt.py", line 125, in forward xs = [x, SparseBMMFunc()(L, x)] File "/home/jiang/work/ping/SurfaceNetworks/src/utils/cuda/sparse_bmm_func.py", line 39, in forward col_ind, col_ptr = batch_csr(matrix1._indices(), matrix1.size()) File "/home/jiang/work/ping/SurfaceNetworks/src/utils/cuda/batch_csr.py", line 39, in __call__ m.load(bytes(ptx.encode())) File "cupy/cuda/function.pyx", line 175, in cupy.cuda.function.Module.load File "cupy/cuda/function.pyx", line 176, in cupy.cuda.function.Module.load File "cupy/cuda/driver.pyx", line 141, in cupy.cuda.driver.moduleLoadData File "cupy/cuda/driver.pyx", line 72, in cupy.cuda.driver.check_status cupy.cuda.driver.CUDADriverError: CUDA_ERROR_NOT_INITIALIZED: initialization error 100% (937 of 937) |########################| Elapsed Time: 0:00:00 Time: 0:00:00

Is there any problem with my operation?

System information

  • Python version: 2.7.15
  • CUDA/cuDNN version: 10.0.130 / 7.5.0
  • GPU model and memory: Nvidia GeForce GTX 980
  • Nvidia driver version: 410.48
  • Linux Ubuntu 18.04

finerc avatar Apr 13 '19 08:04 finerc

Hi,

Unfortunately, I am no CUDA expert. Can you try the dev branch that doesn't use cupy?

jiangzhongshi avatar Apr 16 '19 18:04 jiangzhongshi

that error has gone with refresh the codes, but now i encountered another: from main() in minst_mesh task, line 109: laplacian=utils.sparse_cat(laplacian,sample_batch.num_vertices,sample_batch_num_vertices) it stepped into utils_pt.sparse_cat where i saw sparse_cat() got an list of tensors with layout = sparse.coo, and thereon, for (...) value.append(tensor._values()), the interpreter said it failed to find a dispatch key 'CPUTensorId' for operator _values(), following this message, i saw into the for loop, and found the tensor is actually a tensor with layout = tensor.strided which indicates a dense tensor... is that a issue or something wrong elsewhere? thank you

SimonPig avatar Dec 17 '19 05:12 SimonPig