cudnn.torch
cudnn.torch copied to clipboard
Potential Bug with find.lua - Multiple GPUs
Hi,
I have no idea how this cropped up, but require 'cudnn'
threw an out of memory error
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-7735/cutorch/init.c line=261 error=2 : out of memory
~/distro/install/share/lua/5.1/trepl/init.lua:389:
~/distro/install/share/lua/5.1/trepl/init.lua:389:
~/distro/install/share/lua/5.1/cudnn/find.lua:165: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7735/cutorch/init.c:261
This is strange because I have 4 GPUs (all TitanX; 2 idle and 2 busy) which can be detected by cutorch.getDeviceCount()
, after explicitly setting cutorch.setDevice()
to an idle device and verifying that current GPU is indeed idle using cutorch.getDevice()
and cutorch.getMemoryUsage()
.
For some weird reason, calling require 'cudnn'
sets the current device to a busy one with all the memory occupied. After digging a little into the traceback, I found that in init.lua find.reset()
is called with cutorch.synchronizeAll()
here. In cutorch's init.c, this call cycles through all available GPUs and performs a synchronize()
Changing this to cutorch.synchronize()
seems to solve this error, although I dont know if I've broken anything else.
I've tried updating all the cudnn, cunn and cutorch modules to the latest. Finally also tried a fresh install of torch, to no effect. Please let me know If I'm missing something obvious here.
OS - Ubuntu 14.04 CUDA - 7.5 cuDNN - 5103 GPUs - 4 Nvidia TitanX The 2 busy GPUs are running tensorflow which I think allocates all the memory by default.
EDIT - making that change to find.lua
breaks the code.
cublas runtime error : library not initialized at /tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCGeneral.c:378
Also tried setting CUDA_VISIBLE_DEVICES to a single GPU. This causes a long traceback to be printed
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
~/distro/install/share/lua/5.1/nn/Linear.lua:66: cublas runtime error : library not initialized at /tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCGeneral.c:378
Preferred method to use a subset of GPUs is setting CUDA_VISIBLE_DEVICES, otherwise torch will try to create context on all the GPUs, and with memory on your "busy" GPUs already allocated that could fail. Setting CUDA_VISIBLE_DEVICES to a single GPU should work. Do you have a repro where it fails? The errors that you have (cublas not initialized) are totally unrelated to cudnn.torch, looks like something is wrong with the setup. I also suspect that require 'cutorch' would result in the same error.
Thanks for the quick response.
Setting CUDA_VISIBLE_DEVICES to a single GPU should work.
I did just that. Tried it with each idle GPUs (one at a time) which leads to the cublas error.
I also suspect that require 'cutorch' would result in the same error.
The reason I posted it on this repo is because require cutorch
or require cunn
works just fine. Its require cudnn
which is the problem. Infact I used cutorch to verify the current device and memory usage.
looks like something is wrong with the setup
I have all the paths (cuda/cudnn) set correctly. If its incorrect, then cutorch or cunn shouldn't load right?
PATH
set to PATH=$PATH:/usr/local/cuda-7.5/bin
and LD_LIBRARY_PATH=/home/user/cuda/lib64/:$LD_LIBRARY_PATH
Do you have a repro where it fails?
Not sure what this means. You mean like an example code/scenario? Just running require 'cutorch'; require 'cunn'; require 'cudnn'
causes this error.
Also, I checked again just now when all GPUs are idle and require cudnn
loads without any issues. I'm only facing problems when the some GPUs are occupied in a multi-GPU server. Also using CUDA_VISIBLE_DEVICES
set to any GPU causes it to crash (cublas error above) at all times.
EDIT - This recent cutorch issue seems very relevant to mine.